# Tolerate diacritic / case / whitespace mismatches between `Person` column and member names ## Context For "Mária Maco" there is a payment row in the payments sheet with `Purpose = 2026-04`, but the modal for that member shows neither a paid 2026-04 cell **nor** a row in payment history. Both symptoms collapse to a single root cause in [`reconcile()`](scripts/match_payments.py#L295), confirmed by reading the code: - [`scripts/match_payments.py:404`](scripts/match_payments.py#L404) — `if member_name not in ledger:` is a **byte-exact** comparison. `member_name` is the `Person` cell from the payments sheet with only `.strip()` and `[?]` markers removed ([:349-353](scripts/match_payments.py#L349-L353)). `ledger` keys are the canonical names from the attendance sheet. There is no diacritic, case, or whitespace normalization on this path. (`czech_utils.normalize` is imported and used for the `exceptions` lookup at [:282-283 / :321-322](scripts/match_payments.py#L282-L322), but **not** for member-name matching.) - When a row falls through that check, it is appended to `unmatched` and never reaches `ledger[member_name][m]['paid']` or `['transactions']`. The dashboard's per-month "paid" cell stays unpaid, and because the modal's payment history is built from `data.months[m].transactions` ([`templates/adults.html:772-776`](templates/adults.html#L772-L776)), the row also disappears from the modal's history list. - The new "Raw Payments" debug section ([`templates/adults.html:861`](templates/adults.html#L861)) uses `rawPaymentsByPerson[name]`. Its keys come from [`group_payments_by_person()` in `app.py:60-73`](app.py#L60-L73), which also stores the **literal** `Person` string (only `.strip()` and `[?]` stripped). So if the attendance-sheet name and the `Person` cell differ at the byte level, that section also returns an empty list — which is why the user does not see the row anywhere in the modal. The most likely cause for "Mária Maco" specifically: the `Person` cell was typed (or pasted) without the `á` diacritic — `Maria Maco` vs `Mária Maco`. Other plausible variants the current code silently drops: case differences (`mária maco`), trailing/embedded extra whitespace, and NBSP characters. The fix is to make the matching tolerant via the existing [`czech_utils.normalize()`](scripts/czech_utils.py#L22-L25) helper (NFKD + lowercase), with a small whitespace-collapse on top, and apply the same canonicalization in `group_payments_by_person()` so the modal's raw-payments lookup uses the canonical attendance-sheet name as the key. ## Approach ### 1. `scripts/match_payments.py` — tolerant `Person` → `ledger` resolution in `reconcile()` - Add a small private helper at module scope: ```python def _canonical_key(name: str) -> str: return re.sub(r"\s+", " ", normalize(name)).strip() ``` Uses the existing `normalize()` from `czech_utils` ([:22-25](scripts/czech_utils.py#L22-L25)) and additionally collapses whitespace runs to a single space so `"Mária Maco"` and `"Mária Maco"` both reduce to `"maria maco"`. - Inside [`reconcile()`](scripts/match_payments.py#L295), right after `member_names` is computed ([:308](scripts/match_payments.py#L308)), build a lookup dict once: ```python canonical_by_key: dict[str, str] = {} for name in member_names: key = _canonical_key(name) canonical_by_key.setdefault(key, name) # first wins; ambiguity handled below ``` - Replace the byte-exact check at [:404](scripts/match_payments.py#L404). Resolve each `member_name` from `matched_members` to the canonical attendance-sheet name before any ledger / credits access: ```python for raw_member_name, confidence in matched_members: member_name = canonical_by_key.get(_canonical_key(raw_member_name)) if member_name is None: logger.warning( "Payment matched to unknown member %r (tx: %s, %s) — adding to unmatched", raw_member_name, tx.get("date", "?"), tx.get("message", "?"), ) unmatched.append(tx) continue if member_name != raw_member_name: logger.info( "Person cell %r resolved to canonical member %r — consider fixing the sheet", raw_member_name, member_name, ) # ... rest of the loop body unchanged: ledger[member_name], credits[member_name], … ``` The `logger.info` line lets the user see (in `make web-debug` logs) which sheet rows have a non-canonical `Person` value, so they can clean them up at their own pace — without breaking allocation in the meantime. - Leave the rest of the function untouched. Once `member_name` is the canonical name, every downstream key (`ledger[member_name]`, `credits[member_name]`, `other_ledger[member_name]`, the `tx["person"]` echo into `transactions`) is already correct. ### 2. `app.py` — canonicalize the raw-payments grouping key - The current [`group_payments_by_person()`](app.py#L60-L73) cannot canonicalize on its own because it does not know the attendance-sheet member list. Extend its signature to accept the member list and reuse `_canonical_key`: ```python from match_payments import _canonical_key # or re-export via a tiny public name def group_payments_by_person(transactions, member_names=None): canonical_by_key = ( {_canonical_key(n): n for n in member_names} if member_names else {} ) grouped = {} for tx in transactions: person = str(tx.get("person", "")).strip() if not person: continue for p in person.split(","): p = re.sub(r"\[\?\]\s*", "", p).strip() if not p: continue key = canonical_by_key.get(_canonical_key(p), p) # fallback: keep raw grouped.setdefault(key, []).append(tx) for rows in grouped.values(): rows.sort(key=lambda t: str(t.get("date", "")), reverse=True) return grouped ``` - Update the three call sites to pass `member_names`: - `adults_view()` around [`app.py:333`](app.py#L333) — `members` is already in scope; pass `[name for name, _, _ in members]`. - `juniors_view()` around [`app.py:539`](app.py#L539) — same. - `payments()` around [`app.py:549`](app.py#L549) — same; needs the adult+junior member names so the `/payments` per-person grouping is consistent. - Naming: `_canonical_key` starts with an underscore inside `match_payments.py`. To avoid leaking a private symbol, expose it as `canonical_member_key` (no underscore) in `match_payments.py` and import that name from `app.py`. ### 3. Why not also touch `infer_payments.py` `infer_payments.py` already writes canonical attendance-sheet names into the `Person` column (it picks from `member_names`). The bug only manifests when the cell was filled in **manually** by a human (typed without diacritics, different case) or was written by an older inference that has since drifted from a renamed attendance row. Making `reconcile()` tolerant fixes the symptom for both cases without changing inference. The `logger.info` line is sufficient signal for the user to clean up the sheet on their own schedule. ### 4. Tests **4a. Delete obsolete route tests in [tests/test_app.py](tests/test_app.py).** Four tests target Flask routes that no longer exist (the old fee/reconcile pages were merged into `/adults` and `/juniors`); they currently fail with 404. Their coverage is already provided by `test_adults_route`, `test_juniors_route`, and `test_payments_route`. Delete: - `test_fees_route` ([tests/test_app.py:22-35](tests/test_app.py#L22-L35)) — hits `/fees` - `test_fees_juniors_route` ([tests/test_app.py:37-55](tests/test_app.py#L37-L55)) — hits `/fees-juniors` - `test_reconcile_route` ([tests/test_app.py:57-81](tests/test_app.py#L57-L81)) — hits `/reconcile`; also asserts a literal `OK` string the merged dashboard no longer renders - `test_reconcile_juniors_route` ([tests/test_app.py:101-131](tests/test_app.py#L101-L131)) — hits `/reconcile-juniors`; same `OK` assertion mismatch The two tests that reference junior-only formatting (`? / 1 (J)` and `500 CZK / 4 (1A+3J)`) are testing a retired template, not the live `/juniors` page — no need to migrate those assertions; the live `/juniors` format is already covered by `test_juniors_route`. **4b. Add `tests/test_match_payments.py`** (new file) covering the resolution helper and `reconcile()` end-to-end for the canonicalization fix: - `_canonical_key("Mária Maco") == _canonical_key("maria maco")` - `reconcile()` with member `"Mária Maco"` and a tx `{person: "Maria Maco", purpose: "2026-04", amount: 750, ...}` produces: - `result['members']['Mária Maco']['months']['2026-04']['paid'] == 750` - the tx appears in `result['members']['Mária Maco']['months']['2026-04']['transactions']` - `result['unmatched']` is empty - `reconcile()` with `Person = "Někdo Neznámý"` (no match in members) still routes to `unmatched`. ## Critical files - [scripts/match_payments.py](scripts/match_payments.py) — add `canonical_member_key()` helper; build `canonical_by_key` once in `reconcile()`; resolve `raw_member_name` → `member_name` before ledger access at [:404](scripts/match_payments.py#L404). - [app.py](app.py) — extend `group_payments_by_person()` to accept `member_names` and key the grouped dict by canonical attendance-sheet name; update three call sites. - [tests/test_app.py](tests/test_app.py) — delete the four obsolete route tests listed in §4a. - [tests/test_match_payments.py](tests/test_match_payments.py) — add the cases above (create the file if missing). - [docs/plans/](docs/plans/) — per project [CLAUDE.md](CLAUDE.md), move this plan file to `docs/plans/2026-05-05-1640-payment-person-name-canonicalization.md` once execution starts (the plan-mode harness writes to `~/.claude/plans/` by default). ## Verification 1. **Reproduce first.** Before touching code, open `/adults`, click `[i]` next to "Mária Maco", and confirm both: 2026-04 is unpaid and the payment is missing from history. Inspect the actual `Person` cell value in the payments sheet for the 2026-04 row — confirm it differs from `"Mária Maco"` (likely missing the `á`). Record the exact string for the test case. 2. `make test` — new tests pass; existing tests still green. 3. `make web-debug` and reload `/adults`. The 2026-04 cell for "Mária Maco" turns green (`cell-ok`); the modal's payment history shows the row; the "Raw Payments" section also shows the row. Server log emits `Person cell 'Maria Maco' resolved to canonical member 'Mária Maco' — consider fixing the sheet`. 4. Cross-check `/payments` — the row appears under the `Mária Maco` group (canonical key), not under a separate `Maria Maco` group. 5. Spot-check one member with the conventionally-correct `Person` value (e.g. one of the recent payers visible on the dashboard) — paid cells and history are unchanged, no spurious resolution log line. 6. Confirm a payment with a genuinely unknown `Person` (typo of a non-member) still ends up in the dashboard's `Unmatched` block and emits the existing `Payment matched to unknown member …` warning. 7. Append a `CHANGELOG.md` entry per [CLAUDE.md](CLAUDE.md) once the user confirms the fix works.