New go/internal/domain/matching package porting three helpers from scripts/match_payments.py: - BuildNameVariants: normalized ASCII variants from a member name (nickname in parens, last/first split, len<3 filtered); variants[0] is always the full base name — MatchMembers relies on this invariant. - MatchMembers: auto/review confidence matching with an exact-name short-circuit pass that prevents nickname substrings (tov) from firing inside longer surnames (ottova); common-surname filter for review tier. - FormatDate: nil/empty/""/serial int/float64 (since 1899-12-30, fractional days supported)/YYYY-MM-DD passthrough/garbage → never errors. - InferTransactionDetails: composes BuildNameVariants+MatchMembers+ ParseMonthReferences; falls back to sender-only member match and date-derived month when text carries no signal. 21 table-driven tests; all expected values verified against live Python on 2026-05-06. go-build, go-test, go-lint all clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
127 lines
10 KiB
Markdown
127 lines
10 KiB
Markdown
# M2.7 + M2.8 + M2.9 — Port `matching` package to Go
|
||
|
||
> On approval: copy this plan to `docs/plans/2026-05-06-1305-go-m2-7-2-9-matching.md` per [CLAUDE.md](../../srv/personal/fuj-management/CLAUDE.md) plan-location convention.
|
||
|
||
## Context
|
||
|
||
The Go rewrite (tracked in [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md)) is in milestone M2 — porting pure-domain helpers leaf-first from Python to Go. M2.1 through M2.6 are complete (`czech.Normalize`, `czech.ParseMonthReferences`, `fees.CalculateFee`, `fees.CalculateJuniorFee`, `money.ParseCZK`, `synch.GenerateSyncID`).
|
||
|
||
M2.7, M2.8, and M2.9 cover three helpers from [scripts/match_payments.py](../../srv/personal/fuj-management/scripts/match_payments.py) that form a tight chain: `InferTransactionDetails` calls `MatchMembers` which calls `BuildNameVariants` and the same Sheets-serial date logic that `FormatDate` uses. The user requested they be done together because the dependency graph makes per-milestone commits awkward — `MatchMembers` would either reference an unexported helper not yet committed or commit dead code.
|
||
|
||
This unblocks M2.10 (`reconcile`, the load-bearing function) and M5 parity tests, since reconciliation consumes `InferTransactionDetails` output.
|
||
|
||
## Approach
|
||
|
||
**One commit, one branch, one MR.** Branch: `feat/m2-7-2-9-matching-package`. The three milestone checkboxes get ticked together on merge.
|
||
|
||
### Package layout
|
||
|
||
New package `go/internal/domain/matching/` mirroring the existing `go/internal/domain/{czech,fees,money,synch}` convention (one file per public symbol, tests alongside as `*_test.go`):
|
||
|
||
| File | Contents |
|
||
|---|---|
|
||
| `doc.go` | `// Package matching ports name/member matching from scripts/match_payments.py.` |
|
||
| `name_variants.go` | `BuildNameVariants` + unexported `wordIn` helper (mirrors Python's `_word_in` co-location at [match_payments.py:60-62](../../srv/personal/fuj-management/scripts/match_payments.py#L60)) |
|
||
| `match_members.go` | `Confidence` typed string + constants, `Match` struct, `MatchMembers` |
|
||
| `infer.go` | `Transaction`, `InferredDetails`, `InferTransactionDetails` |
|
||
| `format_date.go` | `FormatDate` |
|
||
| `name_variants_test.go`, `match_members_test.go`, `infer_test.go`, `format_date_test.go` | table-driven tests, each with a top-of-file comment quoting the live Python one-liner used to verify expected values (mirrors [synch_test.go:7-20](../../srv/personal/fuj-management/go/internal/domain/synch/synch_test.go#L7)) |
|
||
|
||
### Public API
|
||
|
||
```go
|
||
type Confidence string
|
||
const (
|
||
ConfidenceAuto Confidence = "auto"
|
||
ConfidenceReview Confidence = "review"
|
||
)
|
||
type Match struct {
|
||
Name string
|
||
Confidence Confidence
|
||
}
|
||
|
||
func BuildNameVariants(name string) []string
|
||
func MatchMembers(text string, memberNames []string) []Match
|
||
|
||
type Transaction struct {
|
||
Sender string
|
||
Message string
|
||
UserID string
|
||
Date any // string | int | float64 — see "Parity concerns"
|
||
}
|
||
type InferredDetails struct {
|
||
Members []Match
|
||
Months []string
|
||
SearchText string // matches Python's "search_text" key, not the misleading "matched_text" docstring
|
||
}
|
||
func InferTransactionDetails(tx Transaction, memberNames []string, defaultYear int) InferredDetails
|
||
|
||
func FormatDate(val any) string
|
||
```
|
||
|
||
### Algorithms (port verbatim — these are the load-bearing details)
|
||
|
||
**`BuildNameVariants`** ([match_payments.py:33-57](../../srv/personal/fuj-management/scripts/match_payments.py#L33)): extract `(nickname)` regex, strip parens for `base`, normalize via `czech.Normalize`, append last + first when ≥2 parts, **filter <3 chars**. `variants[0]` must always be the full normalized base — `MatchMembers` relies on this.
|
||
|
||
**`MatchMembers`** ([match_payments.py:65-137](../../srv/personal/fuj-management/scripts/match_payments.py#L65)):
|
||
1. **Exact short-circuit** ([:77-84](../../srv/personal/fuj-management/scripts/match_payments.py#L77)): if any member's `variants[0]` whole-word matches in `Normalize(text)`, return ONLY those `(name, auto)`. Prevents nickname `tov` matching inside `ottova`.
|
||
2. Otherwise per-member first-match-wins: full-name substring → `\b first \b` AND `\b last \b` (any order) → `\b nickname \b` — each yields `auto` and continues.
|
||
3. **Review tier** ([:113-129](../../srv/personal/fuj-management/scripts/match_payments.py#L113)): ≥2-part names → last name `len ≥ 4` AND not in `{"novak","novakova","prach"}` → review; else first name `len ≥ 3` → review. 1-part names → `len ≥ 4` → review.
|
||
4. **Final filter** ([:131-137](../../srv/personal/fuj-management/scripts/match_payments.py#L131)): if ANY auto exists, drop ALL review. Two-pass — don't try to fuse with the loop.
|
||
|
||
**`InferTransactionDetails`** ([match_payments.py:144-184](../../srv/personal/fuj-management/scripts/match_payments.py#L144)): `search_text = sender + " " + message + " " + user_id`; month parse uses `message + " " + user_id` (excludes sender); fallback 1 retries members on sender alone; fallback 2 derives months from `tx.Date` (Sheets serial or `YYYY-MM-DD`).
|
||
|
||
**`FormatDate`** ([match_payments.py:187-206](../../srv/personal/fuj-management/scripts/match_payments.py#L187)): nil/empty → `""`; int/float → Sheets serial since 1899-12-30 formatted `YYYY-MM-DD`; pre-formatted `YYYY-MM-DD` (length 10, dashes at idx 4/7) → as-is; else `strings.TrimSpace(fmt.Sprint(v))`. **No raise on bad input** — parity contract.
|
||
|
||
## Parity concerns
|
||
|
||
- **RE2 `\b`**: Equivalent to Python `\b` on ASCII-folded input (`Normalize` strips diacritics + lowercases). Use `regexp.QuoteMeta` for `re.escape`.
|
||
- **Sheets epoch**: 1899-12-30 (NOT 1900-01-01). `time.Date(1899, 12, 30, 0, 0, 0, 0, time.UTC)`.
|
||
- **Fractional serials**: Python `timedelta(days=44197.5)` adds 12 hours, then `.strftime("%Y-%m-%d")` discards time. To match exactly use `base.Add(time.Duration(val * 24 * float64(time.Hour)))` then `Format("2006-01-02")`. **Do NOT** use `base.AddDate(0, 0, int(val))` — that silently drops fractional days from real Sheets exports of timestamped cells.
|
||
- **`Transaction.Date any`**: Python `tx["date"]` accepts int/float/string transparently. Sheets API returns serial dates as `float64` from JSON; FIO scraper returns `string`. `any` is the faithful port; type-switch inside `FormatDate` and the date fallback in `InferTransactionDetails`.
|
||
- **`SearchText` vs `MatchedText`**: Python docstring says `matched_text`, code returns `"search_text"`. Port the code, not the docstring.
|
||
- **Default year plumbing**: Go's `czech.ParseMonthReferences(text, defaultYear)` requires explicit year. Python defaults to 2026. Plumb `defaultYear` as the third arg to `InferTransactionDetails`.
|
||
- **Empty slices not nil**: Python `match_members` returns `[]` when nothing matches; ensure Go returns `[]Match{}` not `nil` so consumers don't have to nil-check (matches `synch` package style).
|
||
|
||
## Tests
|
||
|
||
Port all 6 cases from [tests/test_match_members.py](../../srv/personal/fuj-management/tests/test_match_members.py) verbatim into `match_members_test.go` as one table-driven `TestMatchMembers`. Each row: `name`, `text`, `wantContains []string`, `wantExcludes []string`, `wantAllAuto bool`.
|
||
|
||
Add table cases for:
|
||
- `BuildNameVariants` — docstring example `František Vrbík (Štrúdl)` → 4 variants; nickname filtered (len<3); single-part name; whitespace inside parens
|
||
- `FormatDate` — `nil` → `""`, `""` → `""`, `int(44197)` → `"2020-12-31"`, `float64(44197.5)` → `"2020-12-31"`, `"2026-04-15"` → `"2026-04-15"`, `"garbage"` → `"garbage"`, `" 2026-04-15 "` → `"2026-04-15"`
|
||
- `InferTransactionDetails` — members from search_text, members from sender fallback, months from date-string fallback, months from serial-date fallback, both-paths-fail returns empty slices
|
||
|
||
Verify expectations against live Python and quote the one-liner in a top-of-file comment, e.g.:
|
||
|
||
```
|
||
PYTHONPATH=scripts:. python -c '
|
||
from match_payments import format_date
|
||
for v in [None, "", 44197, 44197.5, "2026-04-15", "garbage", " 2026-04-15 "]: print(repr(format_date(v)))
|
||
'
|
||
```
|
||
|
||
## Critical files
|
||
|
||
- **Read for parity** — [scripts/match_payments.py:33-206](../../srv/personal/fuj-management/scripts/match_payments.py#L33), [tests/test_match_members.py](../../srv/personal/fuj-management/tests/test_match_members.py)
|
||
- **Reuse** — `czech.Normalize` ([go/internal/domain/czech/normalize.go](../../srv/personal/fuj-management/go/internal/domain/czech/normalize.go#L15)), `czech.ParseMonthReferences` ([parse_month_references.go:61](../../srv/personal/fuj-management/go/internal/domain/czech/parse_month_references.go#L61))
|
||
- **Mirror conventions** — [go/internal/domain/synch/synch.go](../../srv/personal/fuj-management/go/internal/domain/synch/synch.go), [go/internal/domain/synch/synch_test.go](../../srv/personal/fuj-management/go/internal/domain/synch/synch_test.go)
|
||
- **New** — `go/internal/domain/matching/{doc,name_variants,match_members,infer,format_date}.go` + `*_test.go`
|
||
|
||
## Out of scope (M2.10 / M4 territory — DO NOT touch)
|
||
|
||
- `canonical_member_key` ([match_payments.py:20](../../srv/personal/fuj-management/scripts/match_payments.py#L20))
|
||
- `reconcile`, `fetch_sheet_data`, `fetch_exceptions` — M2.10 / M4
|
||
- Sheets/Drive/FIO I/O glue
|
||
- Fixture capture (`tests/fixtures/pure/`) — M3.3 separately
|
||
|
||
## Verification
|
||
|
||
1. `cd go && make go-build` — clean build.
|
||
2. `cd go && make go-test ./internal/domain/matching/...` — all table tests green.
|
||
3. `cd go && make go-lint` — clean (govet, staticcheck, errcheck, gofumpt, unused).
|
||
4. Spot-check: pick 2–3 random non-trivial cases (e.g. `MatchMembers` with mixed auto/review, `FormatDate(44197.5)`) and run the live Python one-liner from each test's comment block to confirm bytes match.
|
||
5. Append CHANGELOG entry per [CLAUDE.md](../../srv/personal/fuj-management/CLAUDE.md) (timestamp via `date "+%Y-%m-%d %H:%M %Z"`).
|
||
6. Tick M2.7, M2.8, M2.9 in [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](../../srv/personal/fuj-management/docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md) with the merge SHA.
|
||
7. Push branch, open MR via `tea pr create --title "feat(go): port matching helpers (M2.7-2.9)" --base main --head feat/m2-7-2-9-matching-package`, print URL, leave merge to user.
|