feat(go/M2.7-2.9): port domain/matching package

New go/internal/domain/matching package porting three helpers from
scripts/match_payments.py:

- BuildNameVariants: normalized ASCII variants from a member name (nickname
  in parens, last/first split, len<3 filtered); variants[0] is always the
  full base name — MatchMembers relies on this invariant.
- MatchMembers: auto/review confidence matching with an exact-name
  short-circuit pass that prevents nickname substrings (tov) from firing
  inside longer surnames (ottova); common-surname filter for review tier.
- FormatDate: nil/empty/""/serial int/float64 (since 1899-12-30, fractional
  days supported)/YYYY-MM-DD passthrough/garbage → never errors.
- InferTransactionDetails: composes BuildNameVariants+MatchMembers+
  ParseMonthReferences; falls back to sender-only member match and
  date-derived month when text carries no signal.

21 table-driven tests; all expected values verified against live Python
on 2026-05-06. go-build, go-test, go-lint all clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-06 13:19:42 +02:00
parent c2bffed1b8
commit e596f0000e
12 changed files with 835 additions and 3 deletions

View File

@@ -50,9 +50,9 @@ Each task: port the function, write Go unit tests for fresh cases, hook into the
- [x] **M2.4** `domain/fees.CalculateJuniorFee` — port `calculate_junior_fee` with `Expected{Value int; Unknown bool}` for the `"?"` sentinel — `0fc3b6d`
- [x] **M2.5** `domain/money.ParseCZK` — port [infer_payments.py](scripts/infer_payments.py) `parse_czk_amount` (Czech locale: comma decimal, dot/space thousand separators) — `d24d205`
- [x] **M2.6** `domain/synch.GenerateSyncID` — port [sync_fio_to_sheets.py](scripts/sync_fio_to_sheets.py) `generate_sync_id` (SHA-256, byte-stable hash; verify float string format against real sheet rows)
- [ ] **M2.7** `domain/matching.BuildNameVariants` + `MatchMembers` — port `_build_name_variants` and `match_members` from [match_payments.py](scripts/match_payments.py) (auto vs review confidence, common-surname filter)
- [ ] **M2.8** `domain/matching.InferTransactionDetails` — port `infer_transaction_details` (composes name + month parsing)
- [ ] **M2.9** `domain/matching.FormatDate` — port `format_date` (handles Google Sheets serial-day numbers since 1899-12-30)
- [x] **M2.7** `domain/matching.BuildNameVariants` + `MatchMembers` — port `_build_name_variants` and `match_members` from [match_payments.py](scripts/match_payments.py) (auto vs review confidence, common-surname filter)
- [x] **M2.8** `domain/matching.InferTransactionDetails` — port `infer_transaction_details` (composes name + month parsing)
- [x] **M2.9** `domain/matching.FormatDate` — port `format_date` (handles Google Sheets serial-day numbers since 1899-12-30)
- [ ] **M2.10** `domain/reconcile.Reconcile` — port `reconcile` (three-phase allocation: greedy / proportional with float-remainder absorption / even-split fallback). The single most load-bearing function; budget extra time.
- [ ] **M2.11** `fuj fees` subcommand wired up via `domain/fees` + (M4-stub) attendance loader — fail gracefully on missing IO until M4 lands
- [ ] **M2.12** `fuj reconcile` subcommand similarly stubbed

View File

@@ -0,0 +1,126 @@
# M2.7 + M2.8 + M2.9 — Port `matching` package to Go
> On approval: copy this plan to `docs/plans/2026-05-06-1305-go-m2-7-2-9-matching.md` per [CLAUDE.md](../../srv/personal/fuj-management/CLAUDE.md) plan-location convention.
## Context
The Go rewrite (tracked in [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md)) is in milestone M2 — porting pure-domain helpers leaf-first from Python to Go. M2.1 through M2.6 are complete (`czech.Normalize`, `czech.ParseMonthReferences`, `fees.CalculateFee`, `fees.CalculateJuniorFee`, `money.ParseCZK`, `synch.GenerateSyncID`).
M2.7, M2.8, and M2.9 cover three helpers from [scripts/match_payments.py](../../srv/personal/fuj-management/scripts/match_payments.py) that form a tight chain: `InferTransactionDetails` calls `MatchMembers` which calls `BuildNameVariants` and the same Sheets-serial date logic that `FormatDate` uses. The user requested they be done together because the dependency graph makes per-milestone commits awkward — `MatchMembers` would either reference an unexported helper not yet committed or commit dead code.
This unblocks M2.10 (`reconcile`, the load-bearing function) and M5 parity tests, since reconciliation consumes `InferTransactionDetails` output.
## Approach
**One commit, one branch, one MR.** Branch: `feat/m2-7-2-9-matching-package`. The three milestone checkboxes get ticked together on merge.
### Package layout
New package `go/internal/domain/matching/` mirroring the existing `go/internal/domain/{czech,fees,money,synch}` convention (one file per public symbol, tests alongside as `*_test.go`):
| File | Contents |
|---|---|
| `doc.go` | `// Package matching ports name/member matching from scripts/match_payments.py.` |
| `name_variants.go` | `BuildNameVariants` + unexported `wordIn` helper (mirrors Python's `_word_in` co-location at [match_payments.py:60-62](../../srv/personal/fuj-management/scripts/match_payments.py#L60)) |
| `match_members.go` | `Confidence` typed string + constants, `Match` struct, `MatchMembers` |
| `infer.go` | `Transaction`, `InferredDetails`, `InferTransactionDetails` |
| `format_date.go` | `FormatDate` |
| `name_variants_test.go`, `match_members_test.go`, `infer_test.go`, `format_date_test.go` | table-driven tests, each with a top-of-file comment quoting the live Python one-liner used to verify expected values (mirrors [synch_test.go:7-20](../../srv/personal/fuj-management/go/internal/domain/synch/synch_test.go#L7)) |
### Public API
```go
type Confidence string
const (
ConfidenceAuto Confidence = "auto"
ConfidenceReview Confidence = "review"
)
type Match struct {
Name string
Confidence Confidence
}
func BuildNameVariants(name string) []string
func MatchMembers(text string, memberNames []string) []Match
type Transaction struct {
Sender string
Message string
UserID string
Date any // string | int | float64 — see "Parity concerns"
}
type InferredDetails struct {
Members []Match
Months []string
SearchText string // matches Python's "search_text" key, not the misleading "matched_text" docstring
}
func InferTransactionDetails(tx Transaction, memberNames []string, defaultYear int) InferredDetails
func FormatDate(val any) string
```
### Algorithms (port verbatim — these are the load-bearing details)
**`BuildNameVariants`** ([match_payments.py:33-57](../../srv/personal/fuj-management/scripts/match_payments.py#L33)): extract `(nickname)` regex, strip parens for `base`, normalize via `czech.Normalize`, append last + first when ≥2 parts, **filter <3 chars**. `variants[0]` must always be the full normalized base — `MatchMembers` relies on this.
**`MatchMembers`** ([match_payments.py:65-137](../../srv/personal/fuj-management/scripts/match_payments.py#L65)):
1. **Exact short-circuit** ([:77-84](../../srv/personal/fuj-management/scripts/match_payments.py#L77)): if any member's `variants[0]` whole-word matches in `Normalize(text)`, return ONLY those `(name, auto)`. Prevents nickname `tov` matching inside `ottova`.
2. Otherwise per-member first-match-wins: full-name substring → `\b first \b` AND `\b last \b` (any order) → `\b nickname \b` — each yields `auto` and continues.
3. **Review tier** ([:113-129](../../srv/personal/fuj-management/scripts/match_payments.py#L113)): ≥2-part names → last name `len ≥ 4` AND not in `{"novak","novakova","prach"}` → review; else first name `len ≥ 3` → review. 1-part names → `len ≥ 4` → review.
4. **Final filter** ([:131-137](../../srv/personal/fuj-management/scripts/match_payments.py#L131)): if ANY auto exists, drop ALL review. Two-pass — don't try to fuse with the loop.
**`InferTransactionDetails`** ([match_payments.py:144-184](../../srv/personal/fuj-management/scripts/match_payments.py#L144)): `search_text = sender + " " + message + " " + user_id`; month parse uses `message + " " + user_id` (excludes sender); fallback 1 retries members on sender alone; fallback 2 derives months from `tx.Date` (Sheets serial or `YYYY-MM-DD`).
**`FormatDate`** ([match_payments.py:187-206](../../srv/personal/fuj-management/scripts/match_payments.py#L187)): nil/empty → `""`; int/float → Sheets serial since 1899-12-30 formatted `YYYY-MM-DD`; pre-formatted `YYYY-MM-DD` (length 10, dashes at idx 4/7) → as-is; else `strings.TrimSpace(fmt.Sprint(v))`. **No raise on bad input** — parity contract.
## Parity concerns
- **RE2 `\b`**: Equivalent to Python `\b` on ASCII-folded input (`Normalize` strips diacritics + lowercases). Use `regexp.QuoteMeta` for `re.escape`.
- **Sheets epoch**: 1899-12-30 (NOT 1900-01-01). `time.Date(1899, 12, 30, 0, 0, 0, 0, time.UTC)`.
- **Fractional serials**: Python `timedelta(days=44197.5)` adds 12 hours, then `.strftime("%Y-%m-%d")` discards time. To match exactly use `base.Add(time.Duration(val * 24 * float64(time.Hour)))` then `Format("2006-01-02")`. **Do NOT** use `base.AddDate(0, 0, int(val))` — that silently drops fractional days from real Sheets exports of timestamped cells.
- **`Transaction.Date any`**: Python `tx["date"]` accepts int/float/string transparently. Sheets API returns serial dates as `float64` from JSON; FIO scraper returns `string`. `any` is the faithful port; type-switch inside `FormatDate` and the date fallback in `InferTransactionDetails`.
- **`SearchText` vs `MatchedText`**: Python docstring says `matched_text`, code returns `"search_text"`. Port the code, not the docstring.
- **Default year plumbing**: Go's `czech.ParseMonthReferences(text, defaultYear)` requires explicit year. Python defaults to 2026. Plumb `defaultYear` as the third arg to `InferTransactionDetails`.
- **Empty slices not nil**: Python `match_members` returns `[]` when nothing matches; ensure Go returns `[]Match{}` not `nil` so consumers don't have to nil-check (matches `synch` package style).
## Tests
Port all 6 cases from [tests/test_match_members.py](../../srv/personal/fuj-management/tests/test_match_members.py) verbatim into `match_members_test.go` as one table-driven `TestMatchMembers`. Each row: `name`, `text`, `wantContains []string`, `wantExcludes []string`, `wantAllAuto bool`.
Add table cases for:
- `BuildNameVariants` — docstring example `František Vrbík (Štrúdl)` → 4 variants; nickname filtered (len<3); single-part name; whitespace inside parens
- `FormatDate``nil``""`, `""``""`, `int(44197)``"2020-12-31"`, `float64(44197.5)``"2020-12-31"`, `"2026-04-15"``"2026-04-15"`, `"garbage"``"garbage"`, `" 2026-04-15 "``"2026-04-15"`
- `InferTransactionDetails` — members from search_text, members from sender fallback, months from date-string fallback, months from serial-date fallback, both-paths-fail returns empty slices
Verify expectations against live Python and quote the one-liner in a top-of-file comment, e.g.:
```
PYTHONPATH=scripts:. python -c '
from match_payments import format_date
for v in [None, "", 44197, 44197.5, "2026-04-15", "garbage", " 2026-04-15 "]: print(repr(format_date(v)))
'
```
## Critical files
- **Read for parity** — [scripts/match_payments.py:33-206](../../srv/personal/fuj-management/scripts/match_payments.py#L33), [tests/test_match_members.py](../../srv/personal/fuj-management/tests/test_match_members.py)
- **Reuse** — `czech.Normalize` ([go/internal/domain/czech/normalize.go](../../srv/personal/fuj-management/go/internal/domain/czech/normalize.go#L15)), `czech.ParseMonthReferences` ([parse_month_references.go:61](../../srv/personal/fuj-management/go/internal/domain/czech/parse_month_references.go#L61))
- **Mirror conventions** — [go/internal/domain/synch/synch.go](../../srv/personal/fuj-management/go/internal/domain/synch/synch.go), [go/internal/domain/synch/synch_test.go](../../srv/personal/fuj-management/go/internal/domain/synch/synch_test.go)
- **New** — `go/internal/domain/matching/{doc,name_variants,match_members,infer,format_date}.go` + `*_test.go`
## Out of scope (M2.10 / M4 territory — DO NOT touch)
- `canonical_member_key` ([match_payments.py:20](../../srv/personal/fuj-management/scripts/match_payments.py#L20))
- `reconcile`, `fetch_sheet_data`, `fetch_exceptions` — M2.10 / M4
- Sheets/Drive/FIO I/O glue
- Fixture capture (`tests/fixtures/pure/`) — M3.3 separately
## Verification
1. `cd go && make go-build` — clean build.
2. `cd go && make go-test ./internal/domain/matching/...` — all table tests green.
3. `cd go && make go-lint` — clean (govet, staticcheck, errcheck, gofumpt, unused).
4. Spot-check: pick 23 random non-trivial cases (e.g. `MatchMembers` with mixed auto/review, `FormatDate(44197.5)`) and run the live Python one-liner from each test's comment block to confirm bytes match.
5. Append CHANGELOG entry per [CLAUDE.md](../../srv/personal/fuj-management/CLAUDE.md) (timestamp via `date "+%Y-%m-%d %H:%M %Z"`).
6. Tick M2.7, M2.8, M2.9 in [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md) with the merge SHA.
7. Push branch, open MR via `tea pr create --title "feat(go): port matching helpers (M2.7-2.9)" --base main --head feat/m2-7-2-9-matching-package`, print URL, leave merge to user.