Files
fuj-management/docs/plans/2026-05-06-1305-go-m2-7-2-9-matching.md
Jan Novak e596f0000e feat(go/M2.7-2.9): port domain/matching package
New go/internal/domain/matching package porting three helpers from
scripts/match_payments.py:

- BuildNameVariants: normalized ASCII variants from a member name (nickname
  in parens, last/first split, len<3 filtered); variants[0] is always the
  full base name — MatchMembers relies on this invariant.
- MatchMembers: auto/review confidence matching with an exact-name
  short-circuit pass that prevents nickname substrings (tov) from firing
  inside longer surnames (ottova); common-surname filter for review tier.
- FormatDate: nil/empty/""/serial int/float64 (since 1899-12-30, fractional
  days supported)/YYYY-MM-DD passthrough/garbage → never errors.
- InferTransactionDetails: composes BuildNameVariants+MatchMembers+
  ParseMonthReferences; falls back to sender-only member match and
  date-derived month when text carries no signal.

21 table-driven tests; all expected values verified against live Python
on 2026-05-06. go-build, go-test, go-lint all clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 13:19:42 +02:00

10 KiB
Raw Permalink Blame History

M2.7 + M2.8 + M2.9 — Port matching package to Go

On approval: copy this plan to docs/plans/2026-05-06-1305-go-m2-7-2-9-matching.md per CLAUDE.md plan-location convention.

Context

The Go rewrite (tracked in docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md) is in milestone M2 — porting pure-domain helpers leaf-first from Python to Go. M2.1 through M2.6 are complete (czech.Normalize, czech.ParseMonthReferences, fees.CalculateFee, fees.CalculateJuniorFee, money.ParseCZK, synch.GenerateSyncID).

M2.7, M2.8, and M2.9 cover three helpers from scripts/match_payments.py that form a tight chain: InferTransactionDetails calls MatchMembers which calls BuildNameVariants and the same Sheets-serial date logic that FormatDate uses. The user requested they be done together because the dependency graph makes per-milestone commits awkward — MatchMembers would either reference an unexported helper not yet committed or commit dead code.

This unblocks M2.10 (reconcile, the load-bearing function) and M5 parity tests, since reconciliation consumes InferTransactionDetails output.

Approach

One commit, one branch, one MR. Branch: feat/m2-7-2-9-matching-package. The three milestone checkboxes get ticked together on merge.

Package layout

New package go/internal/domain/matching/ mirroring the existing go/internal/domain/{czech,fees,money,synch} convention (one file per public symbol, tests alongside as *_test.go):

File Contents
doc.go // Package matching ports name/member matching from scripts/match_payments.py.
name_variants.go BuildNameVariants + unexported wordIn helper (mirrors Python's _word_in co-location at match_payments.py:60-62)
match_members.go Confidence typed string + constants, Match struct, MatchMembers
infer.go Transaction, InferredDetails, InferTransactionDetails
format_date.go FormatDate
name_variants_test.go, match_members_test.go, infer_test.go, format_date_test.go table-driven tests, each with a top-of-file comment quoting the live Python one-liner used to verify expected values (mirrors synch_test.go:7-20)

Public API

type Confidence string
const (
    ConfidenceAuto   Confidence = "auto"
    ConfidenceReview Confidence = "review"
)
type Match struct {
    Name       string
    Confidence Confidence
}

func BuildNameVariants(name string) []string
func MatchMembers(text string, memberNames []string) []Match

type Transaction struct {
    Sender  string
    Message string
    UserID  string
    Date    any // string | int | float64 — see "Parity concerns"
}
type InferredDetails struct {
    Members    []Match
    Months     []string
    SearchText string // matches Python's "search_text" key, not the misleading "matched_text" docstring
}
func InferTransactionDetails(tx Transaction, memberNames []string, defaultYear int) InferredDetails

func FormatDate(val any) string

Algorithms (port verbatim — these are the load-bearing details)

BuildNameVariants (match_payments.py:33-57): extract (nickname) regex, strip parens for base, normalize via czech.Normalize, append last + first when ≥2 parts, filter <3 chars. variants[0] must always be the full normalized base — MatchMembers relies on this.

MatchMembers (match_payments.py:65-137):

  1. Exact short-circuit (:77-84): if any member's variants[0] whole-word matches in Normalize(text), return ONLY those (name, auto). Prevents nickname tov matching inside ottova.
  2. Otherwise per-member first-match-wins: full-name substring → \b first \b AND \b last \b (any order) → \b nickname \b — each yields auto and continues.
  3. Review tier (:113-129): ≥2-part names → last name len ≥ 4 AND not in {"novak","novakova","prach"} → review; else first name len ≥ 3 → review. 1-part names → len ≥ 4 → review.
  4. Final filter (:131-137): if ANY auto exists, drop ALL review. Two-pass — don't try to fuse with the loop.

InferTransactionDetails (match_payments.py:144-184): search_text = sender + " " + message + " " + user_id; month parse uses message + " " + user_id (excludes sender); fallback 1 retries members on sender alone; fallback 2 derives months from tx.Date (Sheets serial or YYYY-MM-DD).

FormatDate (match_payments.py:187-206): nil/empty → ""; int/float → Sheets serial since 1899-12-30 formatted YYYY-MM-DD; pre-formatted YYYY-MM-DD (length 10, dashes at idx 4/7) → as-is; else strings.TrimSpace(fmt.Sprint(v)). No raise on bad input — parity contract.

Parity concerns

  • RE2 \b: Equivalent to Python \b on ASCII-folded input (Normalize strips diacritics + lowercases). Use regexp.QuoteMeta for re.escape.
  • Sheets epoch: 1899-12-30 (NOT 1900-01-01). time.Date(1899, 12, 30, 0, 0, 0, 0, time.UTC).
  • Fractional serials: Python timedelta(days=44197.5) adds 12 hours, then .strftime("%Y-%m-%d") discards time. To match exactly use base.Add(time.Duration(val * 24 * float64(time.Hour))) then Format("2006-01-02"). Do NOT use base.AddDate(0, 0, int(val)) — that silently drops fractional days from real Sheets exports of timestamped cells.
  • Transaction.Date any: Python tx["date"] accepts int/float/string transparently. Sheets API returns serial dates as float64 from JSON; FIO scraper returns string. any is the faithful port; type-switch inside FormatDate and the date fallback in InferTransactionDetails.
  • SearchText vs MatchedText: Python docstring says matched_text, code returns "search_text". Port the code, not the docstring.
  • Default year plumbing: Go's czech.ParseMonthReferences(text, defaultYear) requires explicit year. Python defaults to 2026. Plumb defaultYear as the third arg to InferTransactionDetails.
  • Empty slices not nil: Python match_members returns [] when nothing matches; ensure Go returns []Match{} not nil so consumers don't have to nil-check (matches synch package style).

Tests

Port all 6 cases from tests/test_match_members.py verbatim into match_members_test.go as one table-driven TestMatchMembers. Each row: name, text, wantContains []string, wantExcludes []string, wantAllAuto bool.

Add table cases for:

  • BuildNameVariants — docstring example František Vrbík (Štrúdl) → 4 variants; nickname filtered (len<3); single-part name; whitespace inside parens
  • FormatDatenil"", """", int(44197)"2020-12-31", float64(44197.5)"2020-12-31", "2026-04-15""2026-04-15", "garbage""garbage", " 2026-04-15 ""2026-04-15"
  • InferTransactionDetails — members from search_text, members from sender fallback, months from date-string fallback, months from serial-date fallback, both-paths-fail returns empty slices

Verify expectations against live Python and quote the one-liner in a top-of-file comment, e.g.:

PYTHONPATH=scripts:. python -c '
from match_payments import format_date
for v in [None, "", 44197, 44197.5, "2026-04-15", "garbage", "  2026-04-15  "]: print(repr(format_date(v)))
'

Critical files

Out of scope (M2.10 / M4 territory — DO NOT touch)

  • canonical_member_key (match_payments.py:20)
  • reconcile, fetch_sheet_data, fetch_exceptions — M2.10 / M4
  • Sheets/Drive/FIO I/O glue
  • Fixture capture (tests/fixtures/pure/) — M3.3 separately

Verification

  1. cd go && make go-build — clean build.
  2. cd go && make go-test ./internal/domain/matching/... — all table tests green.
  3. cd go && make go-lint — clean (govet, staticcheck, errcheck, gofumpt, unused).
  4. Spot-check: pick 23 random non-trivial cases (e.g. MatchMembers with mixed auto/review, FormatDate(44197.5)) and run the live Python one-liner from each test's comment block to confirm bytes match.
  5. Append CHANGELOG entry per CLAUDE.md (timestamp via date "+%Y-%m-%d %H:%M %Z").
  6. Tick M2.7, M2.8, M2.9 in docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md with the merge SHA.
  7. Push branch, open MR via tea pr create --title "feat(go): port matching helpers (M2.7-2.9)" --base main --head feat/m2-7-2-9-matching-package, print URL, leave merge to user.