Files
fuj-management/docs/plans/2026-05-06-1305-go-m2-7-2-9-matching.md
Jan Novak e596f0000e feat(go/M2.7-2.9): port domain/matching package
New go/internal/domain/matching package porting three helpers from
scripts/match_payments.py:

- BuildNameVariants: normalized ASCII variants from a member name (nickname
  in parens, last/first split, len<3 filtered); variants[0] is always the
  full base name — MatchMembers relies on this invariant.
- MatchMembers: auto/review confidence matching with an exact-name
  short-circuit pass that prevents nickname substrings (tov) from firing
  inside longer surnames (ottova); common-surname filter for review tier.
- FormatDate: nil/empty/""/serial int/float64 (since 1899-12-30, fractional
  days supported)/YYYY-MM-DD passthrough/garbage → never errors.
- InferTransactionDetails: composes BuildNameVariants+MatchMembers+
  ParseMonthReferences; falls back to sender-only member match and
  date-derived month when text carries no signal.

21 table-driven tests; all expected values verified against live Python
on 2026-05-06. go-build, go-test, go-lint all clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 13:19:42 +02:00

127 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# M2.7 + M2.8 + M2.9 — Port `matching` package to Go
> On approval: copy this plan to `docs/plans/2026-05-06-1305-go-m2-7-2-9-matching.md` per [CLAUDE.md](../../srv/personal/fuj-management/CLAUDE.md) plan-location convention.
## Context
The Go rewrite (tracked in [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md)) is in milestone M2 — porting pure-domain helpers leaf-first from Python to Go. M2.1 through M2.6 are complete (`czech.Normalize`, `czech.ParseMonthReferences`, `fees.CalculateFee`, `fees.CalculateJuniorFee`, `money.ParseCZK`, `synch.GenerateSyncID`).
M2.7, M2.8, and M2.9 cover three helpers from [scripts/match_payments.py](../../srv/personal/fuj-management/scripts/match_payments.py) that form a tight chain: `InferTransactionDetails` calls `MatchMembers` which calls `BuildNameVariants` and the same Sheets-serial date logic that `FormatDate` uses. The user requested they be done together because the dependency graph makes per-milestone commits awkward — `MatchMembers` would either reference an unexported helper not yet committed or commit dead code.
This unblocks M2.10 (`reconcile`, the load-bearing function) and M5 parity tests, since reconciliation consumes `InferTransactionDetails` output.
## Approach
**One commit, one branch, one MR.** Branch: `feat/m2-7-2-9-matching-package`. The three milestone checkboxes get ticked together on merge.
### Package layout
New package `go/internal/domain/matching/` mirroring the existing `go/internal/domain/{czech,fees,money,synch}` convention (one file per public symbol, tests alongside as `*_test.go`):
| File | Contents |
|---|---|
| `doc.go` | `// Package matching ports name/member matching from scripts/match_payments.py.` |
| `name_variants.go` | `BuildNameVariants` + unexported `wordIn` helper (mirrors Python's `_word_in` co-location at [match_payments.py:60-62](../../srv/personal/fuj-management/scripts/match_payments.py#L60)) |
| `match_members.go` | `Confidence` typed string + constants, `Match` struct, `MatchMembers` |
| `infer.go` | `Transaction`, `InferredDetails`, `InferTransactionDetails` |
| `format_date.go` | `FormatDate` |
| `name_variants_test.go`, `match_members_test.go`, `infer_test.go`, `format_date_test.go` | table-driven tests, each with a top-of-file comment quoting the live Python one-liner used to verify expected values (mirrors [synch_test.go:7-20](../../srv/personal/fuj-management/go/internal/domain/synch/synch_test.go#L7)) |
### Public API
```go
type Confidence string
const (
ConfidenceAuto Confidence = "auto"
ConfidenceReview Confidence = "review"
)
type Match struct {
Name string
Confidence Confidence
}
func BuildNameVariants(name string) []string
func MatchMembers(text string, memberNames []string) []Match
type Transaction struct {
Sender string
Message string
UserID string
Date any // string | int | float64 — see "Parity concerns"
}
type InferredDetails struct {
Members []Match
Months []string
SearchText string // matches Python's "search_text" key, not the misleading "matched_text" docstring
}
func InferTransactionDetails(tx Transaction, memberNames []string, defaultYear int) InferredDetails
func FormatDate(val any) string
```
### Algorithms (port verbatim — these are the load-bearing details)
**`BuildNameVariants`** ([match_payments.py:33-57](../../srv/personal/fuj-management/scripts/match_payments.py#L33)): extract `(nickname)` regex, strip parens for `base`, normalize via `czech.Normalize`, append last + first when ≥2 parts, **filter <3 chars**. `variants[0]` must always be the full normalized base — `MatchMembers` relies on this.
**`MatchMembers`** ([match_payments.py:65-137](../../srv/personal/fuj-management/scripts/match_payments.py#L65)):
1. **Exact short-circuit** ([:77-84](../../srv/personal/fuj-management/scripts/match_payments.py#L77)): if any member's `variants[0]` whole-word matches in `Normalize(text)`, return ONLY those `(name, auto)`. Prevents nickname `tov` matching inside `ottova`.
2. Otherwise per-member first-match-wins: full-name substring → `\b first \b` AND `\b last \b` (any order) → `\b nickname \b` — each yields `auto` and continues.
3. **Review tier** ([:113-129](../../srv/personal/fuj-management/scripts/match_payments.py#L113)): ≥2-part names → last name `len ≥ 4` AND not in `{"novak","novakova","prach"}` → review; else first name `len ≥ 3` → review. 1-part names → `len ≥ 4` → review.
4. **Final filter** ([:131-137](../../srv/personal/fuj-management/scripts/match_payments.py#L131)): if ANY auto exists, drop ALL review. Two-pass — don't try to fuse with the loop.
**`InferTransactionDetails`** ([match_payments.py:144-184](../../srv/personal/fuj-management/scripts/match_payments.py#L144)): `search_text = sender + " " + message + " " + user_id`; month parse uses `message + " " + user_id` (excludes sender); fallback 1 retries members on sender alone; fallback 2 derives months from `tx.Date` (Sheets serial or `YYYY-MM-DD`).
**`FormatDate`** ([match_payments.py:187-206](../../srv/personal/fuj-management/scripts/match_payments.py#L187)): nil/empty → `""`; int/float → Sheets serial since 1899-12-30 formatted `YYYY-MM-DD`; pre-formatted `YYYY-MM-DD` (length 10, dashes at idx 4/7) → as-is; else `strings.TrimSpace(fmt.Sprint(v))`. **No raise on bad input** — parity contract.
## Parity concerns
- **RE2 `\b`**: Equivalent to Python `\b` on ASCII-folded input (`Normalize` strips diacritics + lowercases). Use `regexp.QuoteMeta` for `re.escape`.
- **Sheets epoch**: 1899-12-30 (NOT 1900-01-01). `time.Date(1899, 12, 30, 0, 0, 0, 0, time.UTC)`.
- **Fractional serials**: Python `timedelta(days=44197.5)` adds 12 hours, then `.strftime("%Y-%m-%d")` discards time. To match exactly use `base.Add(time.Duration(val * 24 * float64(time.Hour)))` then `Format("2006-01-02")`. **Do NOT** use `base.AddDate(0, 0, int(val))` — that silently drops fractional days from real Sheets exports of timestamped cells.
- **`Transaction.Date any`**: Python `tx["date"]` accepts int/float/string transparently. Sheets API returns serial dates as `float64` from JSON; FIO scraper returns `string`. `any` is the faithful port; type-switch inside `FormatDate` and the date fallback in `InferTransactionDetails`.
- **`SearchText` vs `MatchedText`**: Python docstring says `matched_text`, code returns `"search_text"`. Port the code, not the docstring.
- **Default year plumbing**: Go's `czech.ParseMonthReferences(text, defaultYear)` requires explicit year. Python defaults to 2026. Plumb `defaultYear` as the third arg to `InferTransactionDetails`.
- **Empty slices not nil**: Python `match_members` returns `[]` when nothing matches; ensure Go returns `[]Match{}` not `nil` so consumers don't have to nil-check (matches `synch` package style).
## Tests
Port all 6 cases from [tests/test_match_members.py](../../srv/personal/fuj-management/tests/test_match_members.py) verbatim into `match_members_test.go` as one table-driven `TestMatchMembers`. Each row: `name`, `text`, `wantContains []string`, `wantExcludes []string`, `wantAllAuto bool`.
Add table cases for:
- `BuildNameVariants` — docstring example `František Vrbík (Štrúdl)` → 4 variants; nickname filtered (len<3); single-part name; whitespace inside parens
- `FormatDate``nil``""`, `""``""`, `int(44197)``"2020-12-31"`, `float64(44197.5)``"2020-12-31"`, `"2026-04-15"``"2026-04-15"`, `"garbage"``"garbage"`, `" 2026-04-15 "``"2026-04-15"`
- `InferTransactionDetails` — members from search_text, members from sender fallback, months from date-string fallback, months from serial-date fallback, both-paths-fail returns empty slices
Verify expectations against live Python and quote the one-liner in a top-of-file comment, e.g.:
```
PYTHONPATH=scripts:. python -c '
from match_payments import format_date
for v in [None, "", 44197, 44197.5, "2026-04-15", "garbage", " 2026-04-15 "]: print(repr(format_date(v)))
'
```
## Critical files
- **Read for parity** — [scripts/match_payments.py:33-206](../../srv/personal/fuj-management/scripts/match_payments.py#L33), [tests/test_match_members.py](../../srv/personal/fuj-management/tests/test_match_members.py)
- **Reuse** — `czech.Normalize` ([go/internal/domain/czech/normalize.go](../../srv/personal/fuj-management/go/internal/domain/czech/normalize.go#L15)), `czech.ParseMonthReferences` ([parse_month_references.go:61](../../srv/personal/fuj-management/go/internal/domain/czech/parse_month_references.go#L61))
- **Mirror conventions** — [go/internal/domain/synch/synch.go](../../srv/personal/fuj-management/go/internal/domain/synch/synch.go), [go/internal/domain/synch/synch_test.go](../../srv/personal/fuj-management/go/internal/domain/synch/synch_test.go)
- **New** — `go/internal/domain/matching/{doc,name_variants,match_members,infer,format_date}.go` + `*_test.go`
## Out of scope (M2.10 / M4 territory — DO NOT touch)
- `canonical_member_key` ([match_payments.py:20](../../srv/personal/fuj-management/scripts/match_payments.py#L20))
- `reconcile`, `fetch_sheet_data`, `fetch_exceptions` — M2.10 / M4
- Sheets/Drive/FIO I/O glue
- Fixture capture (`tests/fixtures/pure/`) — M3.3 separately
## Verification
1. `cd go && make go-build` — clean build.
2. `cd go && make go-test ./internal/domain/matching/...` — all table tests green.
3. `cd go && make go-lint` — clean (govet, staticcheck, errcheck, gofumpt, unused).
4. Spot-check: pick 23 random non-trivial cases (e.g. `MatchMembers` with mixed auto/review, `FormatDate(44197.5)`) and run the live Python one-liner from each test's comment block to confirm bytes match.
5. Append CHANGELOG entry per [CLAUDE.md](../../srv/personal/fuj-management/CLAUDE.md) (timestamp via `date "+%Y-%m-%d %H:%M %Z"`).
6. Tick M2.7, M2.8, M2.9 in [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md) with the merge SHA.
7. Push branch, open MR via `tea pr create --title "feat(go): port matching helpers (M2.7-2.9)" --base main --head feat/m2-7-2-9-matching-package`, print URL, leave merge to user.