SHA-256 dedup hash from sync_fio_to_sheets.py generate_sync_id. Key subtlety: Python str(float) emits "500.0" for whole-valued floats and switches to scientific notation at |f|>=1e16 or |f|<1e-4 — replicated via formatAmount using 'f'/'e' format selection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9.6 KiB
Context
Continuing the Go backend rewrite tracked in
2026-05-03-2349-go-backend-rewrite-progress.md.
M2.1–M2.5 are landed. Next leaf-level pure function is generate_sync_id
from scripts/sync_fio_to_sheets.py:62-77.
It computes a SHA-256 hash over a fixed seven-field projection of a Fio
transaction (date|amount|currency|sender|vs|message|bank_id) and is
the deduplication key written into column K (Sync ID) of the payments
sheet. The Go port must produce a byte-identical digest for the same
transaction; otherwise the Go-side sync (M4.7) would re-append rows
already written by the Python sync, double-counting payments.
The non-trivial part is the amount field's string serialisation:
upstream fio_utils.py always supplies amount as a Python float
(API path: float(val(1) or 0); HTML path: parse_czech_amount(...)
which returns float). Python's str(float) produces "500.0" for
whole-valued floats; Go's strconv.FormatFloat(f, 'g', -1, 64) produces
"500". This is the gotcha called out in the M2.6 line of the progress
tracker.
Python behaviour (the spec)
def generate_sync_id(tx: dict) -> str:
components = [
str(tx.get("date", "")),
str(tx.get("amount", "")),
str(tx.get("currency", "CZK")),
str(tx.get("sender", "")),
str(tx.get("vs", "")),
str(tx.get("message", "")),
str(tx.get("bank_id", "")),
]
raw_str = "|".join(components).lower()
return hashlib.sha256(raw_str.encode("utf-8")).hexdigest()
Behavioural notes for the Go port:
- Field order is load-bearing.
date|amount|currency|sender|vs|message|bank_idexactly. - Separator is
"|". - Whole string is
.lower()-ed before hashing (so e.g. "ABC" sender vs "abc" hash identically). Unicode lower; in practice Fio data is ASCII + Czech diacritics. currencydefaults to"CZK"when missing from the dict (HTML scraper path never sets it). Other fields default to"".amountis afloat. Always. Real Fio data is500.0,1234.56, etc. — no NaN/Inf, but parity test must pin the format.- Output is
hashlib.sha256(...).hexdigest()— 64-char lowercase hex. - Encoding is UTF-8.
str(float) cases observed in real Fio amounts
| float64 | Python str(f) |
Go strconv.FormatFloat(f,'g',-1,64) |
Need |
|---|---|---|---|
500.0 |
"500.0" |
"500" |
append .0 |
1234.56 |
"1234.56" |
"1234.56" |
matches |
0.0 |
"0.0" |
"0" |
append .0 |
-500.0 |
"-500.0" |
"-500" |
append .0 |
0.1 |
"0.1" |
"0.1" |
matches |
99999.99 |
"99999.99" |
"99999.99" |
matches |
For the Fio amount domain (signed CZK, ≤ ~7 digits, ≤2 decimal places),
the rule "'g' with prec -1, then append .0 if result has no . and
no e/E" is exact. We do not need to handle Python's
scientific-notation crossover (>= 1e16) for real data, but the
implementation should still cope with it correctly via the same rule.
Approach
Create new package internal/domain/synch mirroring the layout of
internal/domain/money (single-file module + test file alongside).
Package + signature
// Package synch ports the bank-sync deduplication helper from
// scripts/sync_fio_to_sheets.py.
package synch
// Transaction is the projection of a Fio transaction that participates
// in the Sync ID hash. Other fields (ks, ss, sender_account, …) are
// intentionally excluded — they are not part of the Python hash.
//
// Currency: leave "" to inherit the Python default of "CZK" (matches
// the HTML scraper path which omits the key entirely).
type Transaction struct {
Date string
Amount float64
Currency string
Sender string
VS string
Message string
BankID string
}
// GenerateSyncID returns the lowercase SHA-256 hex digest of
// "date|amount|currency|sender|vs|message|bank_id" (lower-cased), used
// as the dedup key in column K of the payments sheet.
//
// Byte-stable with scripts/sync_fio_to_sheets.py generate_sync_id.
func GenerateSyncID(tx Transaction) string
Currency default
In Go every struct field is always present, so we lose Python's
"missing key vs empty string" distinction. Real-world data either sets
currency = "CZK" (API path) or omits the key (HTML path → "CZK"
default). Empty string never occurs in practice. The Go port collapses
the two by treating Currency == "" as "use CZK":
currency := tx.Currency
if currency == "" {
currency = "CZK"
}
This is byte-equal to Python for every input we will ever see in
production, and avoids forcing callers to pass a *string.
Float formatter
Internal helper, unexported:
// formatAmount mimics Python's str(float) for the float values that
// appear in Fio transactions. For mundane decimal amounts the rule
// is: format with 'g' precision -1, then append ".0" if the result
// has no decimal point and no exponent.
func formatAmount(f float64) string {
s := strconv.FormatFloat(f, 'g', -1, 64)
if !strings.ContainsAny(s, ".eE") {
s += ".0"
}
return s
}
Tested explicitly (see Tests below) so the edge cases (0, whole
numbers, negatives, large/small with exponent) stay locked.
Hash composition
func GenerateSyncID(tx Transaction) string {
currency := tx.Currency
if currency == "" {
currency = "CZK"
}
raw := strings.ToLower(strings.Join([]string{
tx.Date,
formatAmount(tx.Amount),
currency,
tx.Sender,
tx.VS,
tx.Message,
tx.BankID,
}, "|"))
sum := sha256.Sum256([]byte(raw))
return hex.EncodeToString(sum[:])
}
(crypto/sha256 + encoding/hex — both stdlib, no go.mod change.)
Tests
synch_test.go mirrors money_test.go's table-driven style with the
verification snippet at the top of the function. Two test functions:
1. TestGenerateSyncID
Each row's expected digest is computed from the Python source:
PYTHONPATH=scripts:. python -c '
from sync_fio_to_sheets import generate_sync_id
cases = [
{"date":"2026-01-15","amount":500.0,"currency":"CZK","sender":"Jan Novak","vs":"123","message":"clenske 1/2026","bank_id":"abc123"},
{"date":"2026-01-15","amount":500.0,"sender":"Jan Novak","vs":"123","message":"clenske 1/2026","bank_id":"abc123"}, # currency missing → CZK
{"date":"2026-02-10","amount":1234.56,"currency":"CZK","sender":"ABC SRO","vs":"","message":"FAKTURA 42","bank_id":"xyz"}, # mixed case → lowercased
{"date":"2026-03-01","amount":-500.0,"currency":"CZK","sender":"refund","vs":"","message":"","bank_id":""}, # negative
{"date":"2026-04-01","amount":0.0,"currency":"CZK","sender":"","vs":"","message":"","bank_id":""}, # zero amount
{}, # empty dict — every field falls back to default
]
for c in cases:
print(repr(c), "->", generate_sync_id(c))
'
Cases (one row per dict above), each asserting the exact 64-char hex digest the snippet prints. Cover:
- Happy path with all fields set.
Currency: ""→"CZK"default (parity with missing key).- Mixed-case sender/message → lowercased before hashing.
- Negative amount.
- Zero amount.
- Zero-value
Transaction{}— every field at Go zero, currency defaults to"CZK", hash matches Pythongenerate_sync_id({}).
2. TestFormatAmount
Pin the float formatter against Python's str(float):
PYTHONPATH=scripts:. python -c '
for v in [0.0, 500.0, -500.0, 0.1, 1234.56, 99999.99, 1500000.0, 1e16, 1e-5]:
print(repr(v), "->", repr(str(v)))
'
Table of (float64, expected string) pairs. Whole numbers must end in
.0; existing decimal representations pass through unchanged;
exponent-form floats (1e16, 1e-5) keep their format.
Files to create
go/internal/domain/synch/synch.go— package,Transaction,GenerateSyncID, internalformatAmount.go/internal/domain/synch/synch_test.go—TestGenerateSyncID+TestFormatAmount.
No existing Go files need editing.
Verification
cd go && go test ./internal/domain/synch/...
make go-lint
make go-build # sanity: nothing else broke
Plus run the two Python snippets in the Tests section and diff their output against the test tables to confirm parity.
Out of scope (explicit non-goals)
- Hooking into the Tier-1 parity runner. That comes with M3.5
(
-tags=paritybuild constraint andtests/fixtures/pure/). M2.6 ships with hand-written, Python-verified test tables — same approach used by M2.1–M2.5. - A richer
Transactionstruct covering ks/ss/note/sender_account. Those fields aren't part of the hash. M4.4 (Fio IO adapter) will decide whether to reusesynch.Transactionor define its own struct and convert at the boundary. - Polymorphic input (e.g. accepting a
map[string]any). Python's duck-typing is a non-goal in Go. - Any Python callsite migration.
sync_fio_to_sheets.pykeeps using its owngenerate_sync_iduntil M4.7 ports the sync service.
Progress tracker + changelog
After the commit lands:
- Tick
M2.6in docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md with the commit SHA, mirroring the M2.5 entry style. - Add a
CHANGELOG.mdentry at top:## YYYY-MM-DD HH:MM TZ — feat(go/M2.6): port domain/synch.GenerateSyncID.
Branch: feat/m2-6-synch-generate-sync-id (per CLAUDE.md
branch-per-feature workflow). Push, open MR via tea pr create, leave
merge to the user.