match_members() now short-circuits on whole-word full-name hits and uses word-boundary regex everywhere else, so a nickname that is a substring of another member's surname (e.g. "tov" inside "ottova") no longer produces false positives. Adds tests/test_match_members.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6.2 KiB
Exact full-name match for payment inference
Context
A bank payment with the message Henrietta Ottová (Heny): 04/2026 is being inferred to two members: the correct Henrietta Ottová and the unrelated Tomáš Němeček (Tov). As a result, reconcile() splits the amount 50/50 between them, producing wrong balances.
Root cause (scripts/match_payments.py:51-115): match_members runs four substring checks via raw Python in, with no word boundaries. Tomáš's nickname Tov normalizes to tov, which is literally a substring of ottova. Check #3 (match_payments.py:79-85) treats bare nickname presence as an auto-confidence match, so Tomáš is appended even though no part of his name is actually in the message. There is also no short-circuit when a member's full canonical name appears verbatim — every other member is still scored against the same haystack.
Goal: when a member's full canonical name (diacritics-insensitive) appears in the message as whole words, return only the full-name hit(s) and skip nickname/partial scoring entirely. Additionally, harden the remaining checks with word boundaries so future substring collisions (any nickname or short name part that happens to live inside another member's surname) can't reproduce this class of bug.
Approach
Single-file change in scripts/match_payments.py. Two coordinated edits to match_members (match_payments.py:51-115):
1. Add an exact-canonical-name short-circuit (new, before the existing loop)
After computing normalized_text, do a first pass that collects every member whose normalized_base (the full name minus the parenthesized nickname, normalized) appears in the haystack as whole words. If at least one is found, return only those as auto matches and skip the rest of the function.
Implementation sketch (inserted between match_payments.py:58 and match_payments.py:61):
exact_matches = []
for name in member_names:
variants = _build_name_variants(name)
full_name = variants[0] if variants else ""
if full_name and re.search(rf"\b{re.escape(full_name)}\b", normalized_text):
exact_matches.append((name, "auto"))
if exact_matches:
return exact_matches
This satisfies the user's primary ask: when the message literally contains the canonical name, that wins outright. Multi-member messages still work — every full-name occurrence is collected.
2. Replace remaining in normalized_text checks with \b…\b regex
For the three checks that survive the short-circuit (and the review-tier partials), swap raw in for whole-word regex so tov cannot match inside ottova, dan cannot match inside bohdan, etc. Affected lines:
- match_payments.py:73 — first+last name both present
- match_payments.py:82 — nickname presence
- match_payments.py:94 — last-name partial (
review) - match_payments.py:99 — first-name partial (
review) - match_payments.py:104 — single-name member partial
Helper to keep the call sites tidy:
def _word_in(needle: str, haystack: str) -> bool:
return bool(re.search(rf"\b{re.escape(needle)}\b", haystack))
Check #1 (line 67) becomes redundant once the short-circuit is in place, but leave it untouched as a defensive fallback in case _build_name_variants ever returns a full_name shorter than the 3-char filter would allow. (No code change there.)
3. Why this is sufficient
- The reported message
Henrietta Ottová (Heny): 04/2026hits the new short-circuit onhenrietta ottova, returns[("Henrietta Ottová", "auto")], and never even evaluates Tomáš. - Bare-nickname messages (e.g.
Heny 04/2026) skip the short-circuit (no full name present) and fall into the existing nickname check — now word-bounded, sotovno longer collides withottovaeven there. - Combined-payment messages listing two full names continue to work: both are collected by the short-circuit.
Files to modify
- scripts/match_payments.py — only
match_members(lines 51-115). Add_word_inhelper just above it.
Files to read for confidence (no edits)
- scripts/czech_utils.py — confirm
normalize()semantics (NFKD strip + lowercase). Already understood; relevant becausere.escapeon already-normalized lowercase ASCII is safe. - scripts/infer_payments.py — confirm it just consumes the
match_membersoutput verbatim and writes comma-joined names. No change needed; the upstream fix propagates. - scripts/match_payments.py:336-362 —
reconcile()only re-runs inference whenPersonis empty, so existing wrong rows in the sheet must be cleared by hand or via themanual fix/blank-cell workflow before re-runningmake infer.
Verification
-
Unit test — add
tests/test_match_members.py(new file, mirroringtests/test_reconcile_exceptions.pystyle). Cases:match_members("Henrietta Ottová (Heny): 04/2026", ["Henrietta Ottová", "Tomáš Němeček (Tov)"])→[("Henrietta Ottová", "auto")]only.match_members("Heny 04/2026", ["Tomáš Němeček (Tov)", "Henrietta Ottová"])→ no match for Tomáš (the substring trap is closed); whatever the legitimate behavior for "Heny" is, document it.- Combined payment:
match_members("Henrietta Ottová a Tomáš Němeček 04/2026", ["Henrietta Ottová", "Tomáš Němeček (Tov)"])→ both asauto. - Sanity:
match_members("VS 1234 Tomáš Němeček", [...])still returns Tomáš.
-
Run the suite:
make test. -
End-to-end: clear the buggy row's
Person/Purposecells in the payments sheet, thenmake infer, thenmake reconcile. Confirm the payment now allocates fully to Henrietta and balance reflects it. -
Changelog: per CLAUDE.md, append an entry to CHANGELOG.md once the user confirms the fix works in production. Format:
## 2026-05-04 HH:MM TZ — fix: payment inference exact-match short-circuit.