All checks were successful
Deploy to K8s / deploy (push) Successful in 6s
SHA-256 dedup hash from sync_fio_to_sheets.py generate_sync_id. Key subtlety: Python str(float) emits "500.0" for whole-valued floats and switches to scientific notation at |f|>=1e16 or |f|<1e-4 — replicated via formatAmount using 'f'/'e' format selection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
266 lines
9.6 KiB
Markdown
266 lines
9.6 KiB
Markdown
|
||
## Context
|
||
|
||
Continuing the Go backend rewrite tracked in
|
||
[2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md).
|
||
M2.1–M2.5 are landed. Next leaf-level pure function is `generate_sync_id`
|
||
from [scripts/sync_fio_to_sheets.py:62-77](../../srv/personal/fuj-management/scripts/sync_fio_to_sheets.py#L62-L77).
|
||
|
||
It computes a SHA-256 hash over a fixed seven-field projection of a Fio
|
||
transaction (`date|amount|currency|sender|vs|message|bank_id`) and is
|
||
the deduplication key written into column K (`Sync ID`) of the payments
|
||
sheet. The Go port must produce a **byte-identical** digest for the same
|
||
transaction; otherwise the Go-side sync (M4.7) would re-append rows
|
||
already written by the Python sync, double-counting payments.
|
||
|
||
The non-trivial part is the `amount` field's string serialisation:
|
||
upstream `fio_utils.py` always supplies `amount` as a Python `float`
|
||
(API path: `float(val(1) or 0)`; HTML path: `parse_czech_amount(...)`
|
||
which returns `float`). Python's `str(float)` produces `"500.0"` for
|
||
whole-valued floats; Go's `strconv.FormatFloat(f, 'g', -1, 64)` produces
|
||
`"500"`. This is the gotcha called out in the M2.6 line of the progress
|
||
tracker.
|
||
|
||
## Python behaviour (the spec)
|
||
|
||
```py
|
||
def generate_sync_id(tx: dict) -> str:
|
||
components = [
|
||
str(tx.get("date", "")),
|
||
str(tx.get("amount", "")),
|
||
str(tx.get("currency", "CZK")),
|
||
str(tx.get("sender", "")),
|
||
str(tx.get("vs", "")),
|
||
str(tx.get("message", "")),
|
||
str(tx.get("bank_id", "")),
|
||
]
|
||
raw_str = "|".join(components).lower()
|
||
return hashlib.sha256(raw_str.encode("utf-8")).hexdigest()
|
||
```
|
||
|
||
Behavioural notes for the Go port:
|
||
|
||
1. **Field order is load-bearing.** `date|amount|currency|sender|vs|message|bank_id` exactly.
|
||
2. **Separator is `"|"`.**
|
||
3. **Whole string is `.lower()`-ed before hashing** (so e.g. "ABC" sender vs "abc" hash identically). Unicode lower; in practice Fio data is ASCII + Czech diacritics.
|
||
4. **`currency` defaults to `"CZK"`** when missing from the dict (HTML scraper path never sets it). Other fields default to `""`.
|
||
5. **`amount` is a `float`.** Always. Real Fio data is `500.0`, `1234.56`, etc. — no NaN/Inf, but parity test must pin the format.
|
||
6. **Output is `hashlib.sha256(...).hexdigest()`** — 64-char lowercase hex.
|
||
7. **Encoding is UTF-8.**
|
||
|
||
### `str(float)` cases observed in real Fio amounts
|
||
|
||
| float64 | Python `str(f)` | Go `strconv.FormatFloat(f,'g',-1,64)` | Need |
|
||
|---|---|---|---|
|
||
| `500.0` | `"500.0"` | `"500"` | append `.0` |
|
||
| `1234.56` | `"1234.56"` | `"1234.56"` | matches |
|
||
| `0.0` | `"0.0"` | `"0"` | append `.0` |
|
||
| `-500.0` | `"-500.0"` | `"-500"` | append `.0` |
|
||
| `0.1` | `"0.1"` | `"0.1"` | matches |
|
||
| `99999.99` | `"99999.99"` | `"99999.99"` | matches |
|
||
|
||
For the Fio amount domain (signed CZK, ≤ ~7 digits, ≤2 decimal places),
|
||
the rule "`'g'` with prec -1, then append `.0` if result has no `.` and
|
||
no `e`/`E`" is exact. We do not need to handle Python's
|
||
scientific-notation crossover (`>= 1e16`) for real data, but the
|
||
implementation should still cope with it correctly via the same rule.
|
||
|
||
## Approach
|
||
|
||
Create new package `internal/domain/synch` mirroring the layout of
|
||
`internal/domain/money` (single-file module + test file alongside).
|
||
|
||
### Package + signature
|
||
|
||
```go
|
||
// Package synch ports the bank-sync deduplication helper from
|
||
// scripts/sync_fio_to_sheets.py.
|
||
package synch
|
||
|
||
// Transaction is the projection of a Fio transaction that participates
|
||
// in the Sync ID hash. Other fields (ks, ss, sender_account, …) are
|
||
// intentionally excluded — they are not part of the Python hash.
|
||
//
|
||
// Currency: leave "" to inherit the Python default of "CZK" (matches
|
||
// the HTML scraper path which omits the key entirely).
|
||
type Transaction struct {
|
||
Date string
|
||
Amount float64
|
||
Currency string
|
||
Sender string
|
||
VS string
|
||
Message string
|
||
BankID string
|
||
}
|
||
|
||
// GenerateSyncID returns the lowercase SHA-256 hex digest of
|
||
// "date|amount|currency|sender|vs|message|bank_id" (lower-cased), used
|
||
// as the dedup key in column K of the payments sheet.
|
||
//
|
||
// Byte-stable with scripts/sync_fio_to_sheets.py generate_sync_id.
|
||
func GenerateSyncID(tx Transaction) string
|
||
```
|
||
|
||
### `Currency` default
|
||
|
||
In Go every struct field is always present, so we lose Python's
|
||
"missing key vs empty string" distinction. Real-world data either sets
|
||
`currency = "CZK"` (API path) or omits the key (HTML path → `"CZK"`
|
||
default). Empty string never occurs in practice. The Go port collapses
|
||
the two by treating `Currency == ""` as "use `CZK`":
|
||
|
||
```go
|
||
currency := tx.Currency
|
||
if currency == "" {
|
||
currency = "CZK"
|
||
}
|
||
```
|
||
|
||
This is byte-equal to Python for every input we will ever see in
|
||
production, and avoids forcing callers to pass a `*string`.
|
||
|
||
### Float formatter
|
||
|
||
Internal helper, unexported:
|
||
|
||
```go
|
||
// formatAmount mimics Python's str(float) for the float values that
|
||
// appear in Fio transactions. For mundane decimal amounts the rule
|
||
// is: format with 'g' precision -1, then append ".0" if the result
|
||
// has no decimal point and no exponent.
|
||
func formatAmount(f float64) string {
|
||
s := strconv.FormatFloat(f, 'g', -1, 64)
|
||
if !strings.ContainsAny(s, ".eE") {
|
||
s += ".0"
|
||
}
|
||
return s
|
||
}
|
||
```
|
||
|
||
Tested explicitly (see Tests below) so the edge cases (`0`, whole
|
||
numbers, negatives, large/small with exponent) stay locked.
|
||
|
||
### Hash composition
|
||
|
||
```go
|
||
func GenerateSyncID(tx Transaction) string {
|
||
currency := tx.Currency
|
||
if currency == "" {
|
||
currency = "CZK"
|
||
}
|
||
raw := strings.ToLower(strings.Join([]string{
|
||
tx.Date,
|
||
formatAmount(tx.Amount),
|
||
currency,
|
||
tx.Sender,
|
||
tx.VS,
|
||
tx.Message,
|
||
tx.BankID,
|
||
}, "|"))
|
||
sum := sha256.Sum256([]byte(raw))
|
||
return hex.EncodeToString(sum[:])
|
||
}
|
||
```
|
||
|
||
(`crypto/sha256` + `encoding/hex` — both stdlib, no `go.mod` change.)
|
||
|
||
## Tests
|
||
|
||
`synch_test.go` mirrors `money_test.go`'s table-driven style with the
|
||
verification snippet at the top of the function. Two test functions:
|
||
|
||
### 1. `TestGenerateSyncID`
|
||
|
||
Each row's expected digest is computed from the Python source:
|
||
|
||
```sh
|
||
PYTHONPATH=scripts:. python -c '
|
||
from sync_fio_to_sheets import generate_sync_id
|
||
cases = [
|
||
{"date":"2026-01-15","amount":500.0,"currency":"CZK","sender":"Jan Novak","vs":"123","message":"clenske 1/2026","bank_id":"abc123"},
|
||
{"date":"2026-01-15","amount":500.0,"sender":"Jan Novak","vs":"123","message":"clenske 1/2026","bank_id":"abc123"}, # currency missing → CZK
|
||
{"date":"2026-02-10","amount":1234.56,"currency":"CZK","sender":"ABC SRO","vs":"","message":"FAKTURA 42","bank_id":"xyz"}, # mixed case → lowercased
|
||
{"date":"2026-03-01","amount":-500.0,"currency":"CZK","sender":"refund","vs":"","message":"","bank_id":""}, # negative
|
||
{"date":"2026-04-01","amount":0.0,"currency":"CZK","sender":"","vs":"","message":"","bank_id":""}, # zero amount
|
||
{}, # empty dict — every field falls back to default
|
||
]
|
||
for c in cases:
|
||
print(repr(c), "->", generate_sync_id(c))
|
||
'
|
||
```
|
||
|
||
Cases (one row per dict above), each asserting the exact 64-char hex
|
||
digest the snippet prints. Cover:
|
||
|
||
- Happy path with all fields set.
|
||
- `Currency: ""` → `"CZK"` default (parity with missing key).
|
||
- Mixed-case sender/message → lowercased before hashing.
|
||
- Negative amount.
|
||
- Zero amount.
|
||
- Zero-value `Transaction{}` — every field at Go zero, currency defaults
|
||
to `"CZK"`, hash matches Python `generate_sync_id({})`.
|
||
|
||
### 2. `TestFormatAmount`
|
||
|
||
Pin the float formatter against Python's `str(float)`:
|
||
|
||
```sh
|
||
PYTHONPATH=scripts:. python -c '
|
||
for v in [0.0, 500.0, -500.0, 0.1, 1234.56, 99999.99, 1500000.0, 1e16, 1e-5]:
|
||
print(repr(v), "->", repr(str(v)))
|
||
'
|
||
```
|
||
|
||
Table of `(float64, expected string)` pairs. Whole numbers must end in
|
||
`.0`; existing decimal representations pass through unchanged;
|
||
exponent-form floats (`1e16`, `1e-5`) keep their format.
|
||
|
||
## Files to create
|
||
|
||
- `go/internal/domain/synch/synch.go` — package, `Transaction`,
|
||
`GenerateSyncID`, internal `formatAmount`.
|
||
- `go/internal/domain/synch/synch_test.go` — `TestGenerateSyncID` +
|
||
`TestFormatAmount`.
|
||
|
||
No existing Go files need editing.
|
||
|
||
## Verification
|
||
|
||
```sh
|
||
cd go && go test ./internal/domain/synch/...
|
||
make go-lint
|
||
make go-build # sanity: nothing else broke
|
||
```
|
||
|
||
Plus run the two Python snippets in the Tests section and diff their
|
||
output against the test tables to confirm parity.
|
||
|
||
## Out of scope (explicit non-goals)
|
||
|
||
- **Hooking into the Tier-1 parity runner.** That comes with M3.5
|
||
(`-tags=parity` build constraint and `tests/fixtures/pure/`). M2.6
|
||
ships with hand-written, Python-verified test tables — same approach
|
||
used by M2.1–M2.5.
|
||
- **A richer `Transaction` struct** covering ks/ss/note/sender_account.
|
||
Those fields aren't part of the hash. M4.4 (Fio IO adapter) will
|
||
decide whether to reuse `synch.Transaction` or define its own struct
|
||
and convert at the boundary.
|
||
- **Polymorphic input** (e.g. accepting a `map[string]any`). Python's
|
||
duck-typing is a non-goal in Go.
|
||
- **Any Python callsite migration.** `sync_fio_to_sheets.py` keeps using
|
||
its own `generate_sync_id` until M4.7 ports the sync service.
|
||
|
||
## Progress tracker + changelog
|
||
|
||
After the commit lands:
|
||
|
||
- Tick `M2.6` in
|
||
[docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md)
|
||
with the commit SHA, mirroring the M2.5 entry style.
|
||
- Add a `CHANGELOG.md` entry at top:
|
||
`## YYYY-MM-DD HH:MM TZ — feat(go/M2.6): port domain/synch.GenerateSyncID`.
|
||
|
||
Branch: `feat/m2-6-synch-generate-sync-id` (per CLAUDE.md
|
||
branch-per-feature workflow). Push, open MR via `tea pr create`, leave
|
||
merge to the user.
|