- Add canonical_member_key() in match_payments.py to normalize names via NFKD + lowercase + whitespace-collapse before ledger lookup; resolves payments attributed to e.g. "Maria Maco" to canonical "Mária Maco". Emits logger.info when a non-canonical cell is rescued so sheet typos are visible in logs without losing the payment allocation. - Extend group_payments_by_person() in app.py to accept member_names and re-key raw-payment groups under the canonical attendance-sheet name so the modal's Raw Payments debug section also finds the row correctly. - Add raw payments collapsible section to member detail modal in adults.html and juniors.html for debugging payment attribution issues. - Remove 4 obsolete tests targeting routes /fees, /fees-juniors, /reconcile, /reconcile-juniors that no longer exist; add test_match_payments.py covering canonical key equivalence and reconcile() tolerance end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
136 lines
11 KiB
Markdown
136 lines
11 KiB
Markdown
# Tolerate diacritic / case / whitespace mismatches between `Person` column and member names
|
||
|
||
## Context
|
||
|
||
For "Mária Maco" there is a payment row in the payments sheet with `Purpose = 2026-04`, but the modal for that member shows neither a paid 2026-04 cell **nor** a row in payment history. Both symptoms collapse to a single root cause in [`reconcile()`](scripts/match_payments.py#L295), confirmed by reading the code:
|
||
|
||
- [`scripts/match_payments.py:404`](scripts/match_payments.py#L404) — `if member_name not in ledger:` is a **byte-exact** comparison. `member_name` is the `Person` cell from the payments sheet with only `.strip()` and `[?]` markers removed ([:349-353](scripts/match_payments.py#L349-L353)). `ledger` keys are the canonical names from the attendance sheet. There is no diacritic, case, or whitespace normalization on this path. (`czech_utils.normalize` is imported and used for the `exceptions` lookup at [:282-283 / :321-322](scripts/match_payments.py#L282-L322), but **not** for member-name matching.)
|
||
- When a row falls through that check, it is appended to `unmatched` and never reaches `ledger[member_name][m]['paid']` or `['transactions']`. The dashboard's per-month "paid" cell stays unpaid, and because the modal's payment history is built from `data.months[m].transactions` ([`templates/adults.html:772-776`](templates/adults.html#L772-L776)), the row also disappears from the modal's history list.
|
||
- The new "Raw Payments" debug section ([`templates/adults.html:861`](templates/adults.html#L861)) uses `rawPaymentsByPerson[name]`. Its keys come from [`group_payments_by_person()` in `app.py:60-73`](app.py#L60-L73), which also stores the **literal** `Person` string (only `.strip()` and `[?]` stripped). So if the attendance-sheet name and the `Person` cell differ at the byte level, that section also returns an empty list — which is why the user does not see the row anywhere in the modal.
|
||
|
||
The most likely cause for "Mária Maco" specifically: the `Person` cell was typed (or pasted) without the `á` diacritic — `Maria Maco` vs `Mária Maco`. Other plausible variants the current code silently drops: case differences (`mária maco`), trailing/embedded extra whitespace, and NBSP characters.
|
||
|
||
The fix is to make the matching tolerant via the existing [`czech_utils.normalize()`](scripts/czech_utils.py#L22-L25) helper (NFKD + lowercase), with a small whitespace-collapse on top, and apply the same canonicalization in `group_payments_by_person()` so the modal's raw-payments lookup uses the canonical attendance-sheet name as the key.
|
||
|
||
## Approach
|
||
|
||
### 1. `scripts/match_payments.py` — tolerant `Person` → `ledger` resolution in `reconcile()`
|
||
|
||
- Add a small private helper at module scope:
|
||
|
||
```python
|
||
def _canonical_key(name: str) -> str:
|
||
return re.sub(r"\s+", " ", normalize(name)).strip()
|
||
```
|
||
|
||
Uses the existing `normalize()` from `czech_utils` ([:22-25](scripts/czech_utils.py#L22-L25)) and additionally collapses whitespace runs to a single space so `"Mária Maco"` and `"Mária Maco"` both reduce to `"maria maco"`.
|
||
|
||
- Inside [`reconcile()`](scripts/match_payments.py#L295), right after `member_names` is computed ([:308](scripts/match_payments.py#L308)), build a lookup dict once:
|
||
|
||
```python
|
||
canonical_by_key: dict[str, str] = {}
|
||
for name in member_names:
|
||
key = _canonical_key(name)
|
||
canonical_by_key.setdefault(key, name) # first wins; ambiguity handled below
|
||
```
|
||
|
||
- Replace the byte-exact check at [:404](scripts/match_payments.py#L404). Resolve each `member_name` from `matched_members` to the canonical attendance-sheet name before any ledger / credits access:
|
||
|
||
```python
|
||
for raw_member_name, confidence in matched_members:
|
||
member_name = canonical_by_key.get(_canonical_key(raw_member_name))
|
||
if member_name is None:
|
||
logger.warning(
|
||
"Payment matched to unknown member %r (tx: %s, %s) — adding to unmatched",
|
||
raw_member_name, tx.get("date", "?"), tx.get("message", "?"),
|
||
)
|
||
unmatched.append(tx)
|
||
continue
|
||
if member_name != raw_member_name:
|
||
logger.info(
|
||
"Person cell %r resolved to canonical member %r — consider fixing the sheet",
|
||
raw_member_name, member_name,
|
||
)
|
||
# ... rest of the loop body unchanged: ledger[member_name], credits[member_name], …
|
||
```
|
||
|
||
The `logger.info` line lets the user see (in `make web-debug` logs) which sheet rows have a non-canonical `Person` value, so they can clean them up at their own pace — without breaking allocation in the meantime.
|
||
|
||
- Leave the rest of the function untouched. Once `member_name` is the canonical name, every downstream key (`ledger[member_name]`, `credits[member_name]`, `other_ledger[member_name]`, the `tx["person"]` echo into `transactions`) is already correct.
|
||
|
||
### 2. `app.py` — canonicalize the raw-payments grouping key
|
||
|
||
- The current [`group_payments_by_person()`](app.py#L60-L73) cannot canonicalize on its own because it does not know the attendance-sheet member list. Extend its signature to accept the member list and reuse `_canonical_key`:
|
||
|
||
```python
|
||
from match_payments import _canonical_key # or re-export via a tiny public name
|
||
|
||
def group_payments_by_person(transactions, member_names=None):
|
||
canonical_by_key = (
|
||
{_canonical_key(n): n for n in member_names} if member_names else {}
|
||
)
|
||
grouped = {}
|
||
for tx in transactions:
|
||
person = str(tx.get("person", "")).strip()
|
||
if not person:
|
||
continue
|
||
for p in person.split(","):
|
||
p = re.sub(r"\[\?\]\s*", "", p).strip()
|
||
if not p:
|
||
continue
|
||
key = canonical_by_key.get(_canonical_key(p), p) # fallback: keep raw
|
||
grouped.setdefault(key, []).append(tx)
|
||
for rows in grouped.values():
|
||
rows.sort(key=lambda t: str(t.get("date", "")), reverse=True)
|
||
return grouped
|
||
```
|
||
|
||
- Update the three call sites to pass `member_names`:
|
||
- `adults_view()` around [`app.py:333`](app.py#L333) — `members` is already in scope; pass `[name for name, _, _ in members]`.
|
||
- `juniors_view()` around [`app.py:539`](app.py#L539) — same.
|
||
- `payments()` around [`app.py:549`](app.py#L549) — same; needs the adult+junior member names so the `/payments` per-person grouping is consistent.
|
||
|
||
- Naming: `_canonical_key` starts with an underscore inside `match_payments.py`. To avoid leaking a private symbol, expose it as `canonical_member_key` (no underscore) in `match_payments.py` and import that name from `app.py`.
|
||
|
||
### 3. Why not also touch `infer_payments.py`
|
||
|
||
`infer_payments.py` already writes canonical attendance-sheet names into the `Person` column (it picks from `member_names`). The bug only manifests when the cell was filled in **manually** by a human (typed without diacritics, different case) or was written by an older inference that has since drifted from a renamed attendance row. Making `reconcile()` tolerant fixes the symptom for both cases without changing inference. The `logger.info` line is sufficient signal for the user to clean up the sheet on their own schedule.
|
||
|
||
### 4. Tests
|
||
|
||
**4a. Delete obsolete route tests in [tests/test_app.py](tests/test_app.py).** Four tests target Flask routes that no longer exist (the old fee/reconcile pages were merged into `/adults` and `/juniors`); they currently fail with 404. Their coverage is already provided by `test_adults_route`, `test_juniors_route`, and `test_payments_route`. Delete:
|
||
|
||
- `test_fees_route` ([tests/test_app.py:22-35](tests/test_app.py#L22-L35)) — hits `/fees`
|
||
- `test_fees_juniors_route` ([tests/test_app.py:37-55](tests/test_app.py#L37-L55)) — hits `/fees-juniors`
|
||
- `test_reconcile_route` ([tests/test_app.py:57-81](tests/test_app.py#L57-L81)) — hits `/reconcile`; also asserts a literal `OK` string the merged dashboard no longer renders
|
||
- `test_reconcile_juniors_route` ([tests/test_app.py:101-131](tests/test_app.py#L101-L131)) — hits `/reconcile-juniors`; same `OK` assertion mismatch
|
||
|
||
The two tests that reference junior-only formatting (`? / 1 (J)` and `500 CZK / 4 (1A+3J)`) are testing a retired template, not the live `/juniors` page — no need to migrate those assertions; the live `/juniors` format is already covered by `test_juniors_route`.
|
||
|
||
**4b. Add `tests/test_match_payments.py`** (new file) covering the resolution helper and `reconcile()` end-to-end for the canonicalization fix:
|
||
|
||
- `_canonical_key("Mária Maco") == _canonical_key("maria maco")`
|
||
- `reconcile()` with member `"Mária Maco"` and a tx `{person: "Maria Maco", purpose: "2026-04", amount: 750, ...}` produces:
|
||
- `result['members']['Mária Maco']['months']['2026-04']['paid'] == 750`
|
||
- the tx appears in `result['members']['Mária Maco']['months']['2026-04']['transactions']`
|
||
- `result['unmatched']` is empty
|
||
- `reconcile()` with `Person = "Někdo Neznámý"` (no match in members) still routes to `unmatched`.
|
||
|
||
## Critical files
|
||
|
||
- [scripts/match_payments.py](scripts/match_payments.py) — add `canonical_member_key()` helper; build `canonical_by_key` once in `reconcile()`; resolve `raw_member_name` → `member_name` before ledger access at [:404](scripts/match_payments.py#L404).
|
||
- [app.py](app.py) — extend `group_payments_by_person()` to accept `member_names` and key the grouped dict by canonical attendance-sheet name; update three call sites.
|
||
- [tests/test_app.py](tests/test_app.py) — delete the four obsolete route tests listed in §4a.
|
||
- [tests/test_match_payments.py](tests/test_match_payments.py) — add the cases above (create the file if missing).
|
||
- [docs/plans/](docs/plans/) — per project [CLAUDE.md](CLAUDE.md), move this plan file to `docs/plans/2026-05-05-1640-payment-person-name-canonicalization.md` once execution starts (the plan-mode harness writes to `~/.claude/plans/` by default).
|
||
|
||
## Verification
|
||
|
||
1. **Reproduce first.** Before touching code, open `/adults`, click `[i]` next to "Mária Maco", and confirm both: 2026-04 is unpaid and the payment is missing from history. Inspect the actual `Person` cell value in the payments sheet for the 2026-04 row — confirm it differs from `"Mária Maco"` (likely missing the `á`). Record the exact string for the test case.
|
||
2. `make test` — new tests pass; existing tests still green.
|
||
3. `make web-debug` and reload `/adults`. The 2026-04 cell for "Mária Maco" turns green (`cell-ok`); the modal's payment history shows the row; the "Raw Payments" section also shows the row. Server log emits `Person cell 'Maria Maco' resolved to canonical member 'Mária Maco' — consider fixing the sheet`.
|
||
4. Cross-check `/payments` — the row appears under the `Mária Maco` group (canonical key), not under a separate `Maria Maco` group.
|
||
5. Spot-check one member with the conventionally-correct `Person` value (e.g. one of the recent payers visible on the dashboard) — paid cells and history are unchanged, no spurious resolution log line.
|
||
6. Confirm a payment with a genuinely unknown `Person` (typo of a non-member) still ends up in the dashboard's `Unmatched` block and emits the existing `Payment matched to unknown member …` warning.
|
||
7. Append a `CHANGELOG.md` entry per [CLAUDE.md](CLAUDE.md) once the user confirms the fix works.
|