fix: Tolerate diacritic/case/whitespace mismatches in Person column matching
Some checks failed
Deploy to K8s / deploy (push) Successful in 11s
Build and Push / build (push) Successful in 6s
Build and Push / build-go (push) Failing after 6s

- Add canonical_member_key() in match_payments.py to normalize names via
  NFKD + lowercase + whitespace-collapse before ledger lookup; resolves
  payments attributed to e.g. "Maria Maco" to canonical "Mária Maco".
  Emits logger.info when a non-canonical cell is rescued so sheet typos
  are visible in logs without losing the payment allocation.
- Extend group_payments_by_person() in app.py to accept member_names and
  re-key raw-payment groups under the canonical attendance-sheet name so
  the modal's Raw Payments debug section also finds the row correctly.
- Add raw payments collapsible section to member detail modal in adults.html
  and juniors.html for debugging payment attribution issues.
- Remove 4 obsolete tests targeting routes /fees, /fees-juniors, /reconcile,
  /reconcile-juniors that no longer exist; add test_match_payments.py
  covering canonical key equivalence and reconcile() tolerance end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-05 17:22:54 +02:00
parent 81b36878b3
commit 394da2e6b8
8 changed files with 498 additions and 120 deletions

View File

@@ -0,0 +1,135 @@
# Tolerate diacritic / case / whitespace mismatches between `Person` column and member names
## Context
For "Mária Maco" there is a payment row in the payments sheet with `Purpose = 2026-04`, but the modal for that member shows neither a paid 2026-04 cell **nor** a row in payment history. Both symptoms collapse to a single root cause in [`reconcile()`](scripts/match_payments.py#L295), confirmed by reading the code:
- [`scripts/match_payments.py:404`](scripts/match_payments.py#L404) — `if member_name not in ledger:` is a **byte-exact** comparison. `member_name` is the `Person` cell from the payments sheet with only `.strip()` and `[?]` markers removed ([:349-353](scripts/match_payments.py#L349-L353)). `ledger` keys are the canonical names from the attendance sheet. There is no diacritic, case, or whitespace normalization on this path. (`czech_utils.normalize` is imported and used for the `exceptions` lookup at [:282-283 / :321-322](scripts/match_payments.py#L282-L322), but **not** for member-name matching.)
- When a row falls through that check, it is appended to `unmatched` and never reaches `ledger[member_name][m]['paid']` or `['transactions']`. The dashboard's per-month "paid" cell stays unpaid, and because the modal's payment history is built from `data.months[m].transactions` ([`templates/adults.html:772-776`](templates/adults.html#L772-L776)), the row also disappears from the modal's history list.
- The new "Raw Payments" debug section ([`templates/adults.html:861`](templates/adults.html#L861)) uses `rawPaymentsByPerson[name]`. Its keys come from [`group_payments_by_person()` in `app.py:60-73`](app.py#L60-L73), which also stores the **literal** `Person` string (only `.strip()` and `[?]` stripped). So if the attendance-sheet name and the `Person` cell differ at the byte level, that section also returns an empty list — which is why the user does not see the row anywhere in the modal.
The most likely cause for "Mária Maco" specifically: the `Person` cell was typed (or pasted) without the `á` diacritic — `Maria Maco` vs `Mária Maco`. Other plausible variants the current code silently drops: case differences (`mária maco`), trailing/embedded extra whitespace, and NBSP characters.
The fix is to make the matching tolerant via the existing [`czech_utils.normalize()`](scripts/czech_utils.py#L22-L25) helper (NFKD + lowercase), with a small whitespace-collapse on top, and apply the same canonicalization in `group_payments_by_person()` so the modal's raw-payments lookup uses the canonical attendance-sheet name as the key.
## Approach
### 1. `scripts/match_payments.py` — tolerant `Person` → `ledger` resolution in `reconcile()`
- Add a small private helper at module scope:
```python
def _canonical_key(name: str) -> str:
return re.sub(r"\s+", " ", normalize(name)).strip()
```
Uses the existing `normalize()` from `czech_utils` ([:22-25](scripts/czech_utils.py#L22-L25)) and additionally collapses whitespace runs to a single space so `"Mária Maco"` and `"Mária Maco"` both reduce to `"maria maco"`.
- Inside [`reconcile()`](scripts/match_payments.py#L295), right after `member_names` is computed ([:308](scripts/match_payments.py#L308)), build a lookup dict once:
```python
canonical_by_key: dict[str, str] = {}
for name in member_names:
key = _canonical_key(name)
canonical_by_key.setdefault(key, name) # first wins; ambiguity handled below
```
- Replace the byte-exact check at [:404](scripts/match_payments.py#L404). Resolve each `member_name` from `matched_members` to the canonical attendance-sheet name before any ledger / credits access:
```python
for raw_member_name, confidence in matched_members:
member_name = canonical_by_key.get(_canonical_key(raw_member_name))
if member_name is None:
logger.warning(
"Payment matched to unknown member %r (tx: %s, %s) — adding to unmatched",
raw_member_name, tx.get("date", "?"), tx.get("message", "?"),
)
unmatched.append(tx)
continue
if member_name != raw_member_name:
logger.info(
"Person cell %r resolved to canonical member %r — consider fixing the sheet",
raw_member_name, member_name,
)
# ... rest of the loop body unchanged: ledger[member_name], credits[member_name], …
```
The `logger.info` line lets the user see (in `make web-debug` logs) which sheet rows have a non-canonical `Person` value, so they can clean them up at their own pace — without breaking allocation in the meantime.
- Leave the rest of the function untouched. Once `member_name` is the canonical name, every downstream key (`ledger[member_name]`, `credits[member_name]`, `other_ledger[member_name]`, the `tx["person"]` echo into `transactions`) is already correct.
### 2. `app.py` — canonicalize the raw-payments grouping key
- The current [`group_payments_by_person()`](app.py#L60-L73) cannot canonicalize on its own because it does not know the attendance-sheet member list. Extend its signature to accept the member list and reuse `_canonical_key`:
```python
from match_payments import _canonical_key # or re-export via a tiny public name
def group_payments_by_person(transactions, member_names=None):
canonical_by_key = (
{_canonical_key(n): n for n in member_names} if member_names else {}
)
grouped = {}
for tx in transactions:
person = str(tx.get("person", "")).strip()
if not person:
continue
for p in person.split(","):
p = re.sub(r"\[\?\]\s*", "", p).strip()
if not p:
continue
key = canonical_by_key.get(_canonical_key(p), p) # fallback: keep raw
grouped.setdefault(key, []).append(tx)
for rows in grouped.values():
rows.sort(key=lambda t: str(t.get("date", "")), reverse=True)
return grouped
```
- Update the three call sites to pass `member_names`:
- `adults_view()` around [`app.py:333`](app.py#L333) — `members` is already in scope; pass `[name for name, _, _ in members]`.
- `juniors_view()` around [`app.py:539`](app.py#L539) — same.
- `payments()` around [`app.py:549`](app.py#L549) — same; needs the adult+junior member names so the `/payments` per-person grouping is consistent.
- Naming: `_canonical_key` starts with an underscore inside `match_payments.py`. To avoid leaking a private symbol, expose it as `canonical_member_key` (no underscore) in `match_payments.py` and import that name from `app.py`.
### 3. Why not also touch `infer_payments.py`
`infer_payments.py` already writes canonical attendance-sheet names into the `Person` column (it picks from `member_names`). The bug only manifests when the cell was filled in **manually** by a human (typed without diacritics, different case) or was written by an older inference that has since drifted from a renamed attendance row. Making `reconcile()` tolerant fixes the symptom for both cases without changing inference. The `logger.info` line is sufficient signal for the user to clean up the sheet on their own schedule.
### 4. Tests
**4a. Delete obsolete route tests in [tests/test_app.py](tests/test_app.py).** Four tests target Flask routes that no longer exist (the old fee/reconcile pages were merged into `/adults` and `/juniors`); they currently fail with 404. Their coverage is already provided by `test_adults_route`, `test_juniors_route`, and `test_payments_route`. Delete:
- `test_fees_route` ([tests/test_app.py:22-35](tests/test_app.py#L22-L35)) — hits `/fees`
- `test_fees_juniors_route` ([tests/test_app.py:37-55](tests/test_app.py#L37-L55)) — hits `/fees-juniors`
- `test_reconcile_route` ([tests/test_app.py:57-81](tests/test_app.py#L57-L81)) — hits `/reconcile`; also asserts a literal `OK` string the merged dashboard no longer renders
- `test_reconcile_juniors_route` ([tests/test_app.py:101-131](tests/test_app.py#L101-L131)) — hits `/reconcile-juniors`; same `OK` assertion mismatch
The two tests that reference junior-only formatting (`? / 1 (J)` and `500 CZK / 4 (1A+3J)`) are testing a retired template, not the live `/juniors` page — no need to migrate those assertions; the live `/juniors` format is already covered by `test_juniors_route`.
**4b. Add `tests/test_match_payments.py`** (new file) covering the resolution helper and `reconcile()` end-to-end for the canonicalization fix:
- `_canonical_key("Mária Maco") == _canonical_key("maria maco")`
- `reconcile()` with member `"Mária Maco"` and a tx `{person: "Maria Maco", purpose: "2026-04", amount: 750, ...}` produces:
- `result['members']['Mária Maco']['months']['2026-04']['paid'] == 750`
- the tx appears in `result['members']['Mária Maco']['months']['2026-04']['transactions']`
- `result['unmatched']` is empty
- `reconcile()` with `Person = "Někdo Neznámý"` (no match in members) still routes to `unmatched`.
## Critical files
- [scripts/match_payments.py](scripts/match_payments.py) — add `canonical_member_key()` helper; build `canonical_by_key` once in `reconcile()`; resolve `raw_member_name` → `member_name` before ledger access at [:404](scripts/match_payments.py#L404).
- [app.py](app.py) — extend `group_payments_by_person()` to accept `member_names` and key the grouped dict by canonical attendance-sheet name; update three call sites.
- [tests/test_app.py](tests/test_app.py) — delete the four obsolete route tests listed in §4a.
- [tests/test_match_payments.py](tests/test_match_payments.py) — add the cases above (create the file if missing).
- [docs/plans/](docs/plans/) — per project [CLAUDE.md](CLAUDE.md), move this plan file to `docs/plans/2026-05-05-1640-payment-person-name-canonicalization.md` once execution starts (the plan-mode harness writes to `~/.claude/plans/` by default).
## Verification
1. **Reproduce first.** Before touching code, open `/adults`, click `[i]` next to "Mária Maco", and confirm both: 2026-04 is unpaid and the payment is missing from history. Inspect the actual `Person` cell value in the payments sheet for the 2026-04 row — confirm it differs from `"Mária Maco"` (likely missing the `á`). Record the exact string for the test case.
2. `make test` — new tests pass; existing tests still green.
3. `make web-debug` and reload `/adults`. The 2026-04 cell for "Mária Maco" turns green (`cell-ok`); the modal's payment history shows the row; the "Raw Payments" section also shows the row. Server log emits `Person cell 'Maria Maco' resolved to canonical member 'Mária Maco' — consider fixing the sheet`.
4. Cross-check `/payments` — the row appears under the `Mária Maco` group (canonical key), not under a separate `Maria Maco` group.
5. Spot-check one member with the conventionally-correct `Person` value (e.g. one of the recent payers visible on the dashboard) — paid cells and history are unchanged, no spurious resolution log line.
6. Confirm a payment with a genuinely unknown `Person` (typo of a non-member) still ends up in the dashboard's `Unmatched` block and emits the existing `Payment matched to unknown member …` warning.
7. Append a `CHANGELOG.md` entry per [CLAUDE.md](CLAUDE.md) once the user confirms the fix works.