Files
fuj-management/docs/plans/2026-05-06-1236-go-m2-6-synch-generate-sync-id.md
Jan Novak 54a783ea00
All checks were successful
Deploy to K8s / deploy (push) Successful in 6s
feat(go/M2.6): port domain/synch.GenerateSyncID
SHA-256 dedup hash from sync_fio_to_sheets.py generate_sync_id.
Key subtlety: Python str(float) emits "500.0" for whole-valued floats
and switches to scientific notation at |f|>=1e16 or |f|<1e-4 —
replicated via formatAmount using 'f'/'e' format selection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 12:43:41 +02:00

9.6 KiB
Raw Permalink Blame History

Context

Continuing the Go backend rewrite tracked in 2026-05-03-2349-go-backend-rewrite-progress.md. M2.1M2.5 are landed. Next leaf-level pure function is generate_sync_id from scripts/sync_fio_to_sheets.py:62-77.

It computes a SHA-256 hash over a fixed seven-field projection of a Fio transaction (date|amount|currency|sender|vs|message|bank_id) and is the deduplication key written into column K (Sync ID) of the payments sheet. The Go port must produce a byte-identical digest for the same transaction; otherwise the Go-side sync (M4.7) would re-append rows already written by the Python sync, double-counting payments.

The non-trivial part is the amount field's string serialisation: upstream fio_utils.py always supplies amount as a Python float (API path: float(val(1) or 0); HTML path: parse_czech_amount(...) which returns float). Python's str(float) produces "500.0" for whole-valued floats; Go's strconv.FormatFloat(f, 'g', -1, 64) produces "500". This is the gotcha called out in the M2.6 line of the progress tracker.

Python behaviour (the spec)

def generate_sync_id(tx: dict) -> str:
    components = [
        str(tx.get("date", "")),
        str(tx.get("amount", "")),
        str(tx.get("currency", "CZK")),
        str(tx.get("sender", "")),
        str(tx.get("vs", "")),
        str(tx.get("message", "")),
        str(tx.get("bank_id", "")),
    ]
    raw_str = "|".join(components).lower()
    return hashlib.sha256(raw_str.encode("utf-8")).hexdigest()

Behavioural notes for the Go port:

  1. Field order is load-bearing. date|amount|currency|sender|vs|message|bank_id exactly.
  2. Separator is "|".
  3. Whole string is .lower()-ed before hashing (so e.g. "ABC" sender vs "abc" hash identically). Unicode lower; in practice Fio data is ASCII + Czech diacritics.
  4. currency defaults to "CZK" when missing from the dict (HTML scraper path never sets it). Other fields default to "".
  5. amount is a float. Always. Real Fio data is 500.0, 1234.56, etc. — no NaN/Inf, but parity test must pin the format.
  6. Output is hashlib.sha256(...).hexdigest() — 64-char lowercase hex.
  7. Encoding is UTF-8.

str(float) cases observed in real Fio amounts

float64 Python str(f) Go strconv.FormatFloat(f,'g',-1,64) Need
500.0 "500.0" "500" append .0
1234.56 "1234.56" "1234.56" matches
0.0 "0.0" "0" append .0
-500.0 "-500.0" "-500" append .0
0.1 "0.1" "0.1" matches
99999.99 "99999.99" "99999.99" matches

For the Fio amount domain (signed CZK, ≤ ~7 digits, ≤2 decimal places), the rule "'g' with prec -1, then append .0 if result has no . and no e/E" is exact. We do not need to handle Python's scientific-notation crossover (>= 1e16) for real data, but the implementation should still cope with it correctly via the same rule.

Approach

Create new package internal/domain/synch mirroring the layout of internal/domain/money (single-file module + test file alongside).

Package + signature

// Package synch ports the bank-sync deduplication helper from
// scripts/sync_fio_to_sheets.py.
package synch

// Transaction is the projection of a Fio transaction that participates
// in the Sync ID hash. Other fields (ks, ss, sender_account, …) are
// intentionally excluded — they are not part of the Python hash.
//
// Currency: leave "" to inherit the Python default of "CZK" (matches
// the HTML scraper path which omits the key entirely).
type Transaction struct {
    Date     string
    Amount   float64
    Currency string
    Sender   string
    VS       string
    Message  string
    BankID   string
}

// GenerateSyncID returns the lowercase SHA-256 hex digest of
// "date|amount|currency|sender|vs|message|bank_id" (lower-cased), used
// as the dedup key in column K of the payments sheet.
//
// Byte-stable with scripts/sync_fio_to_sheets.py generate_sync_id.
func GenerateSyncID(tx Transaction) string

Currency default

In Go every struct field is always present, so we lose Python's "missing key vs empty string" distinction. Real-world data either sets currency = "CZK" (API path) or omits the key (HTML path → "CZK" default). Empty string never occurs in practice. The Go port collapses the two by treating Currency == "" as "use CZK":

currency := tx.Currency
if currency == "" {
    currency = "CZK"
}

This is byte-equal to Python for every input we will ever see in production, and avoids forcing callers to pass a *string.

Float formatter

Internal helper, unexported:

// formatAmount mimics Python's str(float) for the float values that
// appear in Fio transactions. For mundane decimal amounts the rule
// is: format with 'g' precision -1, then append ".0" if the result
// has no decimal point and no exponent.
func formatAmount(f float64) string {
    s := strconv.FormatFloat(f, 'g', -1, 64)
    if !strings.ContainsAny(s, ".eE") {
        s += ".0"
    }
    return s
}

Tested explicitly (see Tests below) so the edge cases (0, whole numbers, negatives, large/small with exponent) stay locked.

Hash composition

func GenerateSyncID(tx Transaction) string {
    currency := tx.Currency
    if currency == "" {
        currency = "CZK"
    }
    raw := strings.ToLower(strings.Join([]string{
        tx.Date,
        formatAmount(tx.Amount),
        currency,
        tx.Sender,
        tx.VS,
        tx.Message,
        tx.BankID,
    }, "|"))
    sum := sha256.Sum256([]byte(raw))
    return hex.EncodeToString(sum[:])
}

(crypto/sha256 + encoding/hex — both stdlib, no go.mod change.)

Tests

synch_test.go mirrors money_test.go's table-driven style with the verification snippet at the top of the function. Two test functions:

1. TestGenerateSyncID

Each row's expected digest is computed from the Python source:

PYTHONPATH=scripts:. python -c '
from sync_fio_to_sheets import generate_sync_id
cases = [
    {"date":"2026-01-15","amount":500.0,"currency":"CZK","sender":"Jan Novak","vs":"123","message":"clenske 1/2026","bank_id":"abc123"},
    {"date":"2026-01-15","amount":500.0,"sender":"Jan Novak","vs":"123","message":"clenske 1/2026","bank_id":"abc123"},  # currency missing → CZK
    {"date":"2026-02-10","amount":1234.56,"currency":"CZK","sender":"ABC SRO","vs":"","message":"FAKTURA 42","bank_id":"xyz"},  # mixed case → lowercased
    {"date":"2026-03-01","amount":-500.0,"currency":"CZK","sender":"refund","vs":"","message":"","bank_id":""},  # negative
    {"date":"2026-04-01","amount":0.0,"currency":"CZK","sender":"","vs":"","message":"","bank_id":""},  # zero amount
    {},  # empty dict — every field falls back to default
]
for c in cases:
    print(repr(c), "->", generate_sync_id(c))
'

Cases (one row per dict above), each asserting the exact 64-char hex digest the snippet prints. Cover:

  • Happy path with all fields set.
  • Currency: """CZK" default (parity with missing key).
  • Mixed-case sender/message → lowercased before hashing.
  • Negative amount.
  • Zero amount.
  • Zero-value Transaction{} — every field at Go zero, currency defaults to "CZK", hash matches Python generate_sync_id({}).

2. TestFormatAmount

Pin the float formatter against Python's str(float):

PYTHONPATH=scripts:. python -c '
for v in [0.0, 500.0, -500.0, 0.1, 1234.56, 99999.99, 1500000.0, 1e16, 1e-5]:
    print(repr(v), "->", repr(str(v)))
'

Table of (float64, expected string) pairs. Whole numbers must end in .0; existing decimal representations pass through unchanged; exponent-form floats (1e16, 1e-5) keep their format.

Files to create

  • go/internal/domain/synch/synch.go — package, Transaction, GenerateSyncID, internal formatAmount.
  • go/internal/domain/synch/synch_test.goTestGenerateSyncID + TestFormatAmount.

No existing Go files need editing.

Verification

cd go && go test ./internal/domain/synch/...
make go-lint
make go-build   # sanity: nothing else broke

Plus run the two Python snippets in the Tests section and diff their output against the test tables to confirm parity.

Out of scope (explicit non-goals)

  • Hooking into the Tier-1 parity runner. That comes with M3.5 (-tags=parity build constraint and tests/fixtures/pure/). M2.6 ships with hand-written, Python-verified test tables — same approach used by M2.1M2.5.
  • A richer Transaction struct covering ks/ss/note/sender_account. Those fields aren't part of the hash. M4.4 (Fio IO adapter) will decide whether to reuse synch.Transaction or define its own struct and convert at the boundary.
  • Polymorphic input (e.g. accepting a map[string]any). Python's duck-typing is a non-goal in Go.
  • Any Python callsite migration. sync_fio_to_sheets.py keeps using its own generate_sync_id until M4.7 ports the sync service.

Progress tracker + changelog

After the commit lands:

Branch: feat/m2-6-synch-generate-sync-id (per CLAUDE.md branch-per-feature workflow). Push, open MR via tea pr create, leave merge to the user.