Files
fuj-management/docs/plans/2026-05-05-1717-payment-person-name-canonicalization.md
Jan Novak 394da2e6b8
Some checks failed
Deploy to K8s / deploy (push) Successful in 11s
Build and Push / build (push) Successful in 6s
Build and Push / build-go (push) Failing after 6s
fix: Tolerate diacritic/case/whitespace mismatches in Person column matching
- Add canonical_member_key() in match_payments.py to normalize names via
  NFKD + lowercase + whitespace-collapse before ledger lookup; resolves
  payments attributed to e.g. "Maria Maco" to canonical "Mária Maco".
  Emits logger.info when a non-canonical cell is rescued so sheet typos
  are visible in logs without losing the payment allocation.
- Extend group_payments_by_person() in app.py to accept member_names and
  re-key raw-payment groups under the canonical attendance-sheet name so
  the modal's Raw Payments debug section also finds the row correctly.
- Add raw payments collapsible section to member detail modal in adults.html
  and juniors.html for debugging payment attribution issues.
- Remove 4 obsolete tests targeting routes /fees, /fees-juniors, /reconcile,
  /reconcile-juniors that no longer exist; add test_match_payments.py
  covering canonical key equivalence and reconcile() tolerance end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 17:22:54 +02:00

11 KiB

Tolerate diacritic / case / whitespace mismatches between Person column and member names

Context

For "Mária Maco" there is a payment row in the payments sheet with Purpose = 2026-04, but the modal for that member shows neither a paid 2026-04 cell nor a row in payment history. Both symptoms collapse to a single root cause in reconcile(), confirmed by reading the code:

  • scripts/match_payments.py:404if member_name not in ledger: is a byte-exact comparison. member_name is the Person cell from the payments sheet with only .strip() and [?] markers removed (:349-353). ledger keys are the canonical names from the attendance sheet. There is no diacritic, case, or whitespace normalization on this path. (czech_utils.normalize is imported and used for the exceptions lookup at :282-283 / :321-322, but not for member-name matching.)
  • When a row falls through that check, it is appended to unmatched and never reaches ledger[member_name][m]['paid'] or ['transactions']. The dashboard's per-month "paid" cell stays unpaid, and because the modal's payment history is built from data.months[m].transactions (templates/adults.html:772-776), the row also disappears from the modal's history list.
  • The new "Raw Payments" debug section (templates/adults.html:861) uses rawPaymentsByPerson[name]. Its keys come from group_payments_by_person() in app.py:60-73, which also stores the literal Person string (only .strip() and [?] stripped). So if the attendance-sheet name and the Person cell differ at the byte level, that section also returns an empty list — which is why the user does not see the row anywhere in the modal.

The most likely cause for "Mária Maco" specifically: the Person cell was typed (or pasted) without the á diacritic — Maria Maco vs Mária Maco. Other plausible variants the current code silently drops: case differences (mária maco), trailing/embedded extra whitespace, and NBSP characters.

The fix is to make the matching tolerant via the existing czech_utils.normalize() helper (NFKD + lowercase), with a small whitespace-collapse on top, and apply the same canonicalization in group_payments_by_person() so the modal's raw-payments lookup uses the canonical attendance-sheet name as the key.

Approach

1. scripts/match_payments.py — tolerant Personledger resolution in reconcile()

  • Add a small private helper at module scope:

    def _canonical_key(name: str) -> str:
        return re.sub(r"\s+", " ", normalize(name)).strip()
    

    Uses the existing normalize() from czech_utils (:22-25) and additionally collapses whitespace runs to a single space so "Mária Maco" and "Mária Maco" both reduce to "maria maco".

  • Inside reconcile(), right after member_names is computed (:308), build a lookup dict once:

    canonical_by_key: dict[str, str] = {}
    for name in member_names:
        key = _canonical_key(name)
        canonical_by_key.setdefault(key, name)  # first wins; ambiguity handled below
    
  • Replace the byte-exact check at :404. Resolve each member_name from matched_members to the canonical attendance-sheet name before any ledger / credits access:

    for raw_member_name, confidence in matched_members:
        member_name = canonical_by_key.get(_canonical_key(raw_member_name))
        if member_name is None:
            logger.warning(
                "Payment matched to unknown member %r (tx: %s, %s) — adding to unmatched",
                raw_member_name, tx.get("date", "?"), tx.get("message", "?"),
            )
            unmatched.append(tx)
            continue
        if member_name != raw_member_name:
            logger.info(
                "Person cell %r resolved to canonical member %r — consider fixing the sheet",
                raw_member_name, member_name,
            )
        # ... rest of the loop body unchanged: ledger[member_name], credits[member_name], …
    

    The logger.info line lets the user see (in make web-debug logs) which sheet rows have a non-canonical Person value, so they can clean them up at their own pace — without breaking allocation in the meantime.

  • Leave the rest of the function untouched. Once member_name is the canonical name, every downstream key (ledger[member_name], credits[member_name], other_ledger[member_name], the tx["person"] echo into transactions) is already correct.

2. app.py — canonicalize the raw-payments grouping key

  • The current group_payments_by_person() cannot canonicalize on its own because it does not know the attendance-sheet member list. Extend its signature to accept the member list and reuse _canonical_key:

    from match_payments import _canonical_key  # or re-export via a tiny public name
    
    def group_payments_by_person(transactions, member_names=None):
        canonical_by_key = (
            {_canonical_key(n): n for n in member_names} if member_names else {}
        )
        grouped = {}
        for tx in transactions:
            person = str(tx.get("person", "")).strip()
            if not person:
                continue
            for p in person.split(","):
                p = re.sub(r"\[\?\]\s*", "", p).strip()
                if not p:
                    continue
                key = canonical_by_key.get(_canonical_key(p), p)  # fallback: keep raw
                grouped.setdefault(key, []).append(tx)
        for rows in grouped.values():
            rows.sort(key=lambda t: str(t.get("date", "")), reverse=True)
        return grouped
    
  • Update the three call sites to pass member_names:

    • adults_view() around app.py:333members is already in scope; pass [name for name, _, _ in members].
    • juniors_view() around app.py:539 — same.
    • payments() around app.py:549 — same; needs the adult+junior member names so the /payments per-person grouping is consistent.
  • Naming: _canonical_key starts with an underscore inside match_payments.py. To avoid leaking a private symbol, expose it as canonical_member_key (no underscore) in match_payments.py and import that name from app.py.

3. Why not also touch infer_payments.py

infer_payments.py already writes canonical attendance-sheet names into the Person column (it picks from member_names). The bug only manifests when the cell was filled in manually by a human (typed without diacritics, different case) or was written by an older inference that has since drifted from a renamed attendance row. Making reconcile() tolerant fixes the symptom for both cases without changing inference. The logger.info line is sufficient signal for the user to clean up the sheet on their own schedule.

4. Tests

4a. Delete obsolete route tests in tests/test_app.py. Four tests target Flask routes that no longer exist (the old fee/reconcile pages were merged into /adults and /juniors); they currently fail with 404. Their coverage is already provided by test_adults_route, test_juniors_route, and test_payments_route. Delete:

The two tests that reference junior-only formatting (? / 1 (J) and 500 CZK / 4 (1A+3J)) are testing a retired template, not the live /juniors page — no need to migrate those assertions; the live /juniors format is already covered by test_juniors_route.

4b. Add tests/test_match_payments.py (new file) covering the resolution helper and reconcile() end-to-end for the canonicalization fix:

  • _canonical_key("Mária Maco") == _canonical_key("maria maco")
  • reconcile() with member "Mária Maco" and a tx {person: "Maria Maco", purpose: "2026-04", amount: 750, ...} produces:
    • result['members']['Mária Maco']['months']['2026-04']['paid'] == 750
    • the tx appears in result['members']['Mária Maco']['months']['2026-04']['transactions']
    • result['unmatched'] is empty
  • reconcile() with Person = "Někdo Neznámý" (no match in members) still routes to unmatched.

Critical files

  • scripts/match_payments.py — add canonical_member_key() helper; build canonical_by_key once in reconcile(); resolve raw_member_namemember_name before ledger access at :404.
  • app.py — extend group_payments_by_person() to accept member_names and key the grouped dict by canonical attendance-sheet name; update three call sites.
  • tests/test_app.py — delete the four obsolete route tests listed in §4a.
  • tests/test_match_payments.py — add the cases above (create the file if missing).
  • docs/plans/ — per project CLAUDE.md, move this plan file to docs/plans/2026-05-05-1640-payment-person-name-canonicalization.md once execution starts (the plan-mode harness writes to ~/.claude/plans/ by default).

Verification

  1. Reproduce first. Before touching code, open /adults, click [i] next to "Mária Maco", and confirm both: 2026-04 is unpaid and the payment is missing from history. Inspect the actual Person cell value in the payments sheet for the 2026-04 row — confirm it differs from "Mária Maco" (likely missing the á). Record the exact string for the test case.
  2. make test — new tests pass; existing tests still green.
  3. make web-debug and reload /adults. The 2026-04 cell for "Mária Maco" turns green (cell-ok); the modal's payment history shows the row; the "Raw Payments" section also shows the row. Server log emits Person cell 'Maria Maco' resolved to canonical member 'Mária Maco' — consider fixing the sheet.
  4. Cross-check /payments — the row appears under the Mária Maco group (canonical key), not under a separate Maria Maco group.
  5. Spot-check one member with the conventionally-correct Person value (e.g. one of the recent payers visible on the dashboard) — paid cells and history are unchanged, no spurious resolution log line.
  6. Confirm a payment with a genuinely unknown Person (typo of a non-member) still ends up in the dashboard's Unmatched block and emits the existing Payment matched to unknown member … warning.
  7. Append a CHANGELOG.md entry per CLAUDE.md once the user confirms the fix works.