Files
fuj-management/docs/plans/2026-05-06-2111-go-m3-fixture-capture.md
Jan Novak 67d2f11d7c
All checks were successful
Deploy to K8s / deploy (push) Successful in 7s
feat(go): fixture capture + characterization framework (M3)
Closes M3.1–M3.6.  Parity safety net proving Go output matches Python
for every ported pure-domain function (M2.1–M2.9) and reconcile (M2.10).

Capture pipeline:
- scripts/capture_fixtures.py: calls each Python function with seeded
  inputs, emits JSON fixtures to stdout (never writes files directly).
- scripts/scrub_fixtures.py: deterministic PII scrubber — SHA-256
  pseudonyms for member names, digit-preserving hashes for VS/account/
  bank_id, name-sweep in message text.  Idempotent; no salt.
- scripts/_fixture_seeds.py: handcrafted seeds for all 11 functions;
  synthetic names throughout (no real roster members).
- scripts/capture_all_fixtures.sh: convenience wrapper for full corpus
  regeneration outside of make.

Fixture corpus (98 files, all PII-free):
- go/tests/fixtures/pure/<func>/<case>.json — 10 function directories.
- go/tests/fixtures/reconcile/<NN>_<case>.json — 10 branch-coverage
  cases: greedy, overpayment credit, proportional remainder, even-split,
  out-of-window, exception override, other: purpose, junior ?, multi-
  person+month fan-out, unmatched.

Go parity tests (//go:build parity):
- go/tests/parity/parityio.go: generic LoadDir/RunAll helpers + typed
  In/Out struct pairs for all 10 pure functions; Envelope decoder for
  int/float/none disambiguation.
- 10 pure-function test packages + bespoke reconcile test with per-cell
  float tolerance (math.Abs <= 0.01 for `paid` values).

Makefile: go-parity, go-test-all, capture-fixtures targets.
go/tests/fixtures/README.md: refresh workflow + PII audit guide.

Gate: make go-test green, make go-parity green (11/11 packages),
      make go-lint clean (parity tag), make go-build clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 23:26:24 +02:00

22 KiB
Raw Blame History

M3 — Fixture capture + characterization framework

On approval: copy this plan to docs/plans/2026-05-06-2111-go-m3-fixture-capture.md per CLAUDE.md plan-location convention.

Context

The Go rewrite (tracked in docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md) finished M2.1M2.12 — every pure-domain helper is ported and the fuj fees / fuj reconcile CLIs are wired. M3 closes the loop: it builds the parity safety net that proves Go output matches Python output for every ported function. Without it, M2 is "trust me", and the rewrite has no defensible cutover criterion.

M3 has three deliverables:

  1. A capture pipeline (scripts/capture_fixtures.py + scripts/scrub_fixtures.py) that produces deterministic, PII-free JSON fixtures from the live Python implementations.
  2. A fixture corpus at go/tests/fixtures/ covering the 10 pure functions of M2 (M2.1M2.9) plus 10 reconcile cases spanning every code path of reconcile() (M2.10).
  3. A parity test runner in go/tests/parity/ under //go:build parity that replays each fixture and asserts byte/value equality against the Go port.

User-confirmed scope decisions:

  • Single MR for all six sub-tasks (M3.1M3.6) — they're tightly coupled; no half-state is committable.
  • Type envelope only where it matters — four fields (generate_sync_id.tx.amount, parse_czk_amount.val, format_date.val, infer_transaction_details.tx.date) use {"type":..., "value":...} to disambiguate int/float/none. Everything else uses raw JSON.
  • Real seeds for parse_month_references and match_members only — read curated message strings from tmp/payments_transactions_cache.json, scrub, ship. Other functions stay on handcrafted seeds.
  • Plan committed at docs/plans/2026-05-06-2111-go-m3-fixture-capture.md — same convention as every M-series predecessor.

Branch + landing

  • Branch: feat/m3-fixture-capture. Single MR via tea pr create. Tick M3.1M3.6 on merge with the SHA.
  • No edits to existing Python or Go production code. M3 is purely additive: new scripts, new fixtures, new test files, new Makefile targets, README, CHANGELOG entry, plan archive, progress tracker tick.

File layout

Python (capture pipeline):

Fixture corpus (committed, PII-free):

  • go/tests/fixtures/README.md — refresh workflow + scrubbing audit guide.
  • go/tests/fixtures/pure/<func>/<case>.json — one directory per function (10 functions: normalize, parse_month_references, calculate_fee, calculate_junior_fee, parse_czk_amount, generate_sync_id, build_name_variants, match_members, infer_transaction_details, format_date).
  • go/tests/fixtures/reconcile/<NN>_<case>.json — 10 numbered reconcile cases.

Go parity tests (all under //go:build parity):

Modified:

Capture invocation interface

Two-stage pipeline (capture | scrub) so each stage is independently debuggable:

python scripts/capture_fixtures.py --func <name> --case <id> --input-seed <id> \
  | python scripts/scrub_fixtures.py \
  > go/tests/fixtures/pure/<func>/<id>.json

Capture flags:

  • --func — target function (normalize, reconcile, etc.).
  • --case — human-authored case ID, becomes the file stem. Never auto-generated (auto-IDs cause git churn).
  • --input-seed <id> — pull from _fixture_seeds.py registry (the default mode for handcrafted cases).
  • --input-stdin — read a single JSON {"args":[...], "kwargs":{...}} doc from stdin (used by the real-message extractor for parse_month_references / match_members).
  • --all — iterate every seed for one function, emit newline-delimited JSON to stdout. Used by the make capture-fixtures recipe.

Capture never writes files. Output goes to stdout; the caller redirects. The scrubber is always stdin→stdout. Both are pure transforms.

The make capture-fixtures target codifies the full refresh workflow. Humans read the target before they read the README.

Fixture JSON shape (normative)

One JSON object per case:

{
  "case": "range_wrap_nov_to_jan",
  "func": "scripts.czech_utils.parse_month_references",
  "captured_at": "2026-05-06",
  "input": { ... },
  "output": { ... }
}

captured_at is date-only — same-day re-runs produce byte-identical files. No git SHA, no hostname, no time component.

Per-function input/output schemas

The schema is the stable contract between Python capture and Go consumption. Where Python returns heterogeneous types, the capture step pre-translates to the typed shape Go expects.

Function Input Output
normalize {"text":"…"} {"text":"…"}
parse_month_references {"text":"…","default_year":2026} {"months":["2026-01",…]}
calculate_fee {"attendance_count":3,"month_key":"2026-02"} {"fee":750}
calculate_junior_fee {"attendance_count":1,"month_key":"2026-02"} {"value":0,"unknown":true} (mirrors fees.Expected{Value, Unknown})
parse_czk_amount {"val":<envelope>} {"amount":1500.0}
generate_sync_id {"tx":{"date":"…","amount":<envelope>,"currency":"CZK","sender":"…","vs":"…","message":"…","bank_id":"…"}} {"sync_id":"<sha256-hex>"}
_build_name_variants {"name":"…"} {"variants":["…"]}
match_members {"text":"…","member_names":["…"]} {"matches":[{"name":"…","confidence":"auto"}]}
infer_transaction_details {"tx":{"sender":"…","message":"…","user_id":"…","date":<envelope>},"member_names":[…],"default_year":2026} {"members":[…],"months":[…],"search_text":"…"}
format_date {"val":<envelope>} {"date":"…"}

Type envelope (used in 4 fields above):

{"type":"int","value":750}    // distinguishes 750 from 750.0
{"type":"float","value":750.0}
{"type":"string","value":"…"}
{"type":"none"}

The envelope is the answer to the generate_sync_id parity risk: Python's str(750.0) == "750.0" vs str(750) == "750" produces different SHA-256 inputs. JSON natively conflates these; the envelope round-trips them. Go's loader switches on type and constructs the matching native value before calling the port.

reconcile uses raw JSON for everything (its inputs are typed maps/slices already), with one nuance: the Member.fees[month] value can be an int or a (fee, count) tuple per match_payments.py:339-340. Capture normalises both to {"fee":int,"count":int} so Go side has one shape.

Scrubber strategy

scrub_fixtures.py: stdin → stdout, no state, no salt, no random. Deterministic plain SHA-256. Re-runs are idempotent. Trade-off acknowledged: an attacker with the script can mathematically reverse the mapping. That's fine — the scrubber's job is to keep PII out of git diffs and Claude transcripts, not to defend against an adversary with the source tree.

Scramble whitelist (only these field keys are scrambled)

name, member_names[], person, sender, sender_account, account, vs, bank_id, user_id, note. Plus a per-document name-substring sweep over message strings — applied before the field-key walk, because real names show up embedded in message text.

Everything else (dates, amounts, currency, month_key, attendance_count, purpose, confidence, expected, paid, total_balance, fee, all YYYY-MM keys, match/matches structure) is preserved verbatim. Whitelist-of-scramble (not blacklist-of-preserve): when a new field appears, it stays raw until someone explicitly adds it to the list. Fails safe.

Scrambling functions

  • Names: Member_<8hex> where <8hex> = sha256(name).hexdigest()[:8]. Same name → same pseudonym across the whole document and across all fixtures. Stable diffs.
  • Account numbers ([0-9]+/[0-9]{4}): scramble prefix and bank-suffix separately, preserving length and format.
  • VS / bank_id / user_id: digit-string-preserving hash to a same-length numeric token. Non-numeric input → id_<8hex>.
  • Note: replaced verbatim with "<scrubbed>". Notes are never load-bearing for any test.
  • Message (free text): name-sweep applied; rest preserved. Corpus author spot-checks before commit. README §5 documents the audit grep.

Reconcile fixtures (10 handcrafted cases)

All seeds live in _fixture_seeds.py as triples (members, sorted_months, transactions, exceptions, default_year). Capture runs the live Python reconcile() and emits canonical JSON; scrubber is a no-op for handcrafted synthetic names but runs anyway for uniformity.

File Branch exercised
01_greedy_exact.json Greedy: amount == sum(expected); zero credit.
02_greedy_overpayment_credit.json Greedy with overflow → credit.
03_proportional_remainder.json Underpayment across 3 months with non-integer split (last month absorbs float remainder per match_payments.py:421+).
04_even_split_prepayment.json All expected == 0 → even-split fallback.
05_out_of_window_credit.json Month outside sorted_months → that share goes to credits, in-window proportional for the rest.
06_exception_override.json Exception entry overrides expected.
07_other_purpose_split.json purpose="other:tournament" with two members.
08_junior_question_mark.json Junior with attendance count 1 → Expected{Unknown:true}; reconcile reads it as 0 expected.
09_multiperson_multimonth.json person="Alice, Bob", purpose="2026-01, 2026-02" → 2x2 fan-out: even-split-by-people then proportional-by-month.
10_unmatched.json Empty person, garbage message → goes to unmatched.

The seed registry is the single source of truth for these inputs. If Python behaviour drifts intentionally, fixtures regenerate cleanly via make capture-fixtures.

Real-data seeds (for parse_month_references and match_members only)

_fixture_seeds.py reads tmp/payments_transactions_cache.json (already gitignored) and selects:

  • parse_month_references: ~15 distinct messages exercising the 45 Czech month declensions, range wraps ("prosinec-leden"), year inference, and the m >= 10 → previous year heuristic. Selection done once interactively, the chosen indices hardcoded into _fixture_seeds.py so re-runs are deterministic. Messages flow through capture (which calls parse_month_references(msg, default_year=2026)) then scrubber (name-sweep against the live member roster).
  • match_members: ~10 distinct (message, member_names) pairs exercising auto vs review confidence, common-surname filter, exact-short-circuit. Same pipeline.

Out of scope for real seeds: normalize, _build_name_variants, reconcile. These either don't benefit from real data (synthetic exhaustively covers normalize, _build_name_variants) or have surgical-input requirements that real data can't reliably hit (reconcile's 10 branches).

Go parity-test layout

One file per function, one Go package per function, mirroring the fixture tree. Each file is short (~30 lines):

//go:build parity

package normalize_parity_test

import (
    "fuj-management/go/internal/domain/czech"
    "fuj-management/go/tests/parity"
    "testing"
)

func TestNormalizeParity(t *testing.T) {
    t.Parallel()
    parity.RunAll(t, "../../../fixtures/pure/normalize",
        func(in parity.NormalizeIn) parity.NormalizeOut {
            return parity.NormalizeOut{Text: czech.Normalize(in.Text)}
        })
}

The shared go/tests/parity/parityio.go (also //go:build parity) provides:

  • Case[I, O any] generic loader: walks a fixture directory, decodes each .json, returns (name, input, want) triples.
  • RunAll[I, O any](t, dir, fn func(I) O): invokes fn, compares against want with reflect.DeepEqual (sorted-slice normalisation for the few sets-cast-to-lists Python returns); for floats uses math.Abs(got-want) <= 0.01.
  • One typed <Func>In / <Func>Out struct pair per function (10 pairs), mirroring §3's JSON shape exactly. Envelope decoder helpers (AmountEnvelope, ValueEnvelope) live here.

Reconcile is bespokereconcile/reconcile_parity_test.go doesn't use RunAll because it needs cell-by-cell tolerant float compare across nested maps. It walks the fixture dir directly.

Why one-file-per-function (instead of an umbrella runner): each function lives in a different domain package, so tests must import a different package; an umbrella would obscure which package is being checked. Split also enables go test -tags=parity ./tests/parity/pure/normalize/ to iterate on a single port.

Why a separate test tree (instead of co-located parity tests): the M2 unit tests are co-located by convention (e.g. go/internal/domain/czech/normalize_test.go). The progress tracker explicitly says fixtures live at go/tests/fixtures/ and the gate is go test -tags=parity ./tests/parity/pure/.... Co-location would scatter fixtures across packages — messy. Separate tree wins.

Build tag + Makefile

Every parity test file starts with //go:build parity. Default make go-test excludes them; make go-parity runs them:

go-parity:
	cd $(GO_SRC) && go test -tags=parity ./tests/parity/...

go-test-all: go-test go-parity

capture-fixtures:
	@bash scripts/capture_all_fixtures.sh    # invokes capture | scrub for every seed

Parity is not folded into default go-test: keeps the M2 unit-test loop fast, and a missing-fixture failure shouldn't block routine work. CI runs both targets independently so a parity break is a distinct red signal from a unit-test break.

README content (go/tests/fixtures/README.md)

Six sections, ~120 lines:

  1. What's in this tree — directory map; one line per fixture function explaining what it validates.
  2. Fixture format — link to schemas in §3; worked example for parse_month_references and one for reconcile.
  3. Refresh workflowmake capture-fixtures regenerates everything; single-file recipe for incremental updates. Always diff before committing.
  4. When to refresh — bullet list (schema change, new Czech declension, new fee tier, new reconcile branch). Do not refresh to "fix" a parity failure without first proving the Python behaviour is the intended one.
  5. Verifying scrubbinggit diff should show only Member_<hex>-shaped names, <scrubbed> notes, structurally-preserved account/VS digits. Audit grep: git ls-files go/tests/fixtures | xargs grep -l '<your real name>' should return zero before commit.
  6. Adding a new fixture — three steps (add to _fixture_seeds.py, run capture, add In/Out Go struct fields if needed).

Parity concerns

  • Float arithmetic in reconcile proportional phase: ordering-sensitive, may diverge between Python and Go due to FMA. Tolerance 0.01 already in go/internal/domain/reconcile/reconcile_test.go; parity uses the same tolerance.
  • Sync-ID float-vs-int stringification: handled by the envelope (§3). Capture two paired cases per amount value (amount_750_int.json, amount_750_float.json) so any Go-side conflation surfaces immediately.
  • NFKD edge cases: capture set must include rare characters from real names. The handcrafted normalize seeds enumerate every distinct character observed in the live member roster (extracted once from tmp/attendance_regular_cache.json, hardcoded into _fixture_seeds.py as a single-character-per-case sweep).
  • Czech month declensions: the real-message seeds for parse_month_references cover the wild; handcrafted seeds cover the corner cases (prosinec-leden wrap, m >= 10 heuristic).
  • Insertion-order determinism in reconcile: Python 3.7+ dict iteration is insertion-ordered; the seed registry preserves order. Go side iterates sortedMonths slice explicitly; the parity test verifies this.
  • infer_transaction_details default_year: Python signature defaults to 2026; capture passes default_year as an explicit input. Go side reads it from the fixture.

Out of scope (explicitly DO NOT touch)

  • Real Google Sheets / Drive / Fio loader implementations — M4.1M4.6.
  • Web routes / handlers — M5.
  • fuj sync and fuj infer subcommands — M4.7/M4.8.
  • Tier-2 JSON-API parity (cmd/parity/main.go) — M5.4.
  • Any change to existing Python code (capture is read-only against the production scripts).
  • Any change to existing Go production code under go/internal/.

Verification

  1. make go-build — clean build (parity tests excluded by default tag).
  2. make go-test — all M2 unit tests still green; no parity test runs.
  3. make go-parity — every fixture in go/tests/fixtures/pure/ and go/tests/fixtures/reconcile/ deserialises and passes its parity assertion.
  4. make go-lint — clean (parity test files lint-clean under -tags=parity since golangci-lint honours build tags via .golangci.yml).
  5. Capture round-trip: pick one fixture (e.g. parse_month_references/range_wrap_nov_to_jan.json), regenerate via python scripts/capture_fixtures.py --func parse_month_references --case range_wrap_nov_to_jan --input-seed range_wrap_nov_to_jan | python scripts/scrub_fixtures.py, confirm byte-identical to the committed file.
  6. Scrubbing audit: run the README §5 grep against any name from the live roster — zero hits.
  7. Reconcile branch coverage: read each of the 10 reconcile fixture files, confirm the output field shows the expected branch (e.g. 02_greedy_overpayment_credit.json has a non-zero credits entry; 04_even_split_prepayment.json has equal paid across all months).
  8. Append CHANGELOG entry per CLAUDE.md (timestamp via date "+%Y-%m-%d %H:%M %Z").
  9. Tick M3.1M3.6 in docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md with the merge SHA. Update the M3 milestone summary line if M3 is now fully closed.
  10. Push branch, open MR via tea pr create --title "feat(go): fixture capture + characterization framework (M3)" --base main --head feat/m3-fixture-capture, print URL, leave merge to user.

Critical files