kacerr/fuj-management

Fork 0

Files

Jan Novak 81b36878b3

Deploy to K8s / deploy (push) Successful in 11s

Details

Build and Push / build (push) Successful in 7s

Details

Build and Push / build-go (push) Failing after 5s

Details

fix: Payment inference returns only exact-name matches when present

match_members() now short-circuits on whole-word full-name hits and
uses word-boundary regex everywhere else, so a nickname that is a
substring of another member's surname (e.g. "tov" inside "ottova")
no longer produces false positives. Adds tests/test_match_members.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-04 23:08:59 +02:00

6.2 KiB

Raw Blame History

Exact full-name match for payment inference

Context

A bank payment with the message Henrietta Ottová (Heny): 04/2026 is being inferred to two members: the correct Henrietta Ottová and the unrelated Tomáš Němeček (Tov). As a result, reconcile() splits the amount 50/50 between them, producing wrong balances.

Root cause (scripts/match_payments.py:51-115): match_members runs four substring checks via raw Python in, with no word boundaries. Tomáš's nickname Tov normalizes to tov, which is literally a substring of ottova. Check #3 (match_payments.py:79-85) treats bare nickname presence as an auto-confidence match, so Tomáš is appended even though no part of his name is actually in the message. There is also no short-circuit when a member's full canonical name appears verbatim — every other member is still scored against the same haystack.

Goal: when a member's full canonical name (diacritics-insensitive) appears in the message as whole words, return only the full-name hit(s) and skip nickname/partial scoring entirely. Additionally, harden the remaining checks with word boundaries so future substring collisions (any nickname or short name part that happens to live inside another member's surname) can't reproduce this class of bug.

Approach

Single-file change in scripts/match_payments.py. Two coordinated edits to match_members (match_payments.py:51-115):

1. Add an exact-canonical-name short-circuit (new, before the existing loop)

After computing normalized_text, do a first pass that collects every member whose normalized_base (the full name minus the parenthesized nickname, normalized) appears in the haystack as whole words. If at least one is found, return only those as auto matches and skip the rest of the function.

Implementation sketch (inserted between match_payments.py:58 and match_payments.py:61):

exact_matches = []
for name in member_names:
    variants = _build_name_variants(name)
    full_name = variants[0] if variants else ""
    if full_name and re.search(rf"\b{re.escape(full_name)}\b", normalized_text):
        exact_matches.append((name, "auto"))
if exact_matches:
    return exact_matches

This satisfies the user's primary ask: when the message literally contains the canonical name, that wins outright. Multi-member messages still work — every full-name occurrence is collected.

2. Replace remaining `in normalized_text` checks with `\b…\b` regex

For the three checks that survive the short-circuit (and the review-tier partials), swap raw in for whole-word regex so tov cannot match inside ottova, dan cannot match inside bohdan, etc. Affected lines:

match_payments.py:73 — first+last name both present
match_payments.py:82 — nickname presence
match_payments.py:94 — last-name partial (review)
match_payments.py:99 — first-name partial (review)
match_payments.py:104 — single-name member partial

Helper to keep the call sites tidy:

def _word_in(needle: str, haystack: str) -> bool:
    return bool(re.search(rf"\b{re.escape(needle)}\b", haystack))

Check #1 (line 67) becomes redundant once the short-circuit is in place, but leave it untouched as a defensive fallback in case _build_name_variants ever returns a full_name shorter than the 3-char filter would allow. (No code change there.)

3. Why this is sufficient

The reported message Henrietta Ottová (Heny): 04/2026 hits the new short-circuit on henrietta ottova, returns [("Henrietta Ottová", "auto")], and never even evaluates Tomáš.
Bare-nickname messages (e.g. Heny 04/2026) skip the short-circuit (no full name present) and fall into the existing nickname check — now word-bounded, so tov no longer collides with ottova even there.
Combined-payment messages listing two full names continue to work: both are collected by the short-circuit.

Files to modify

scripts/match_payments.py — only match_members (lines 51-115). Add _word_in helper just above it.

Files to read for confidence (no edits)

scripts/czech_utils.py — confirm normalize() semantics (NFKD strip + lowercase). Already understood; relevant because re.escape on already-normalized lowercase ASCII is safe.
scripts/infer_payments.py — confirm it just consumes the match_members output verbatim and writes comma-joined names. No change needed; the upstream fix propagates.
scripts/match_payments.py:336-362 — reconcile() only re-runs inference when Person is empty, so existing wrong rows in the sheet must be cleared by hand or via the manual fix/blank-cell workflow before re-running make infer.

Verification

Unit test — add tests/test_match_members.py (new file, mirroring tests/test_reconcile_exceptions.py style). Cases:
- match_members("Henrietta Ottová (Heny): 04/2026", ["Henrietta Ottová", "Tomáš Němeček (Tov)"]) → [("Henrietta Ottová", "auto")] only.
- match_members("Heny 04/2026", ["Tomáš Němeček (Tov)", "Henrietta Ottová"]) → no match for Tomáš (the substring trap is closed); whatever the legitimate behavior for "Heny" is, document it.
- Combined payment: match_members("Henrietta Ottová a Tomáš Němeček 04/2026", ["Henrietta Ottová", "Tomáš Němeček (Tov)"]) → both as auto.
- Sanity: match_members("VS 1234 Tomáš Němeček", [...]) still returns Tomáš.
Run the suite: make test.
End-to-end: clear the buggy row's Person/Purpose cells in the payments sheet, then make infer, then make reconcile. Confirm the payment now allocates fully to Henrietta and balance reflects it.
Changelog: per CLAUDE.md, append an entry to CHANGELOG.md once the user confirms the fix works in production. Format: ## 2026-05-04 HH:MM TZ — fix: payment inference exact-match short-circuit.

6.2 KiB Raw Blame History

Exact full-name match for payment inference

Context

Approach

1. Add an exact-canonical-name short-circuit (new, before the existing loop)

2. Replace remaining in normalized_text checks with \b…\b regex

3. Why this is sufficient

Files to modify

Files to read for confidence (no edits)

Verification

6.2 KiB

Raw Blame History

2. Replace remaining `in normalized_text` checks with `\b…\b` regex