Files
fuj-management/docs/plans/2026-05-05-1717-payment-person-name-canonicalization.md
Jan Novak 394da2e6b8
Some checks failed
Deploy to K8s / deploy (push) Successful in 11s
Build and Push / build (push) Successful in 6s
Build and Push / build-go (push) Failing after 6s
fix: Tolerate diacritic/case/whitespace mismatches in Person column matching
- Add canonical_member_key() in match_payments.py to normalize names via
  NFKD + lowercase + whitespace-collapse before ledger lookup; resolves
  payments attributed to e.g. "Maria Maco" to canonical "Mária Maco".
  Emits logger.info when a non-canonical cell is rescued so sheet typos
  are visible in logs without losing the payment allocation.
- Extend group_payments_by_person() in app.py to accept member_names and
  re-key raw-payment groups under the canonical attendance-sheet name so
  the modal's Raw Payments debug section also finds the row correctly.
- Add raw payments collapsible section to member detail modal in adults.html
  and juniors.html for debugging payment attribution issues.
- Remove 4 obsolete tests targeting routes /fees, /fees-juniors, /reconcile,
  /reconcile-juniors that no longer exist; add test_match_payments.py
  covering canonical key equivalence and reconcile() tolerance end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 17:22:54 +02:00

136 lines
11 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Tolerate diacritic / case / whitespace mismatches between `Person` column and member names
## Context
For "Mária Maco" there is a payment row in the payments sheet with `Purpose = 2026-04`, but the modal for that member shows neither a paid 2026-04 cell **nor** a row in payment history. Both symptoms collapse to a single root cause in [`reconcile()`](scripts/match_payments.py#L295), confirmed by reading the code:
- [`scripts/match_payments.py:404`](scripts/match_payments.py#L404) — `if member_name not in ledger:` is a **byte-exact** comparison. `member_name` is the `Person` cell from the payments sheet with only `.strip()` and `[?]` markers removed ([:349-353](scripts/match_payments.py#L349-L353)). `ledger` keys are the canonical names from the attendance sheet. There is no diacritic, case, or whitespace normalization on this path. (`czech_utils.normalize` is imported and used for the `exceptions` lookup at [:282-283 / :321-322](scripts/match_payments.py#L282-L322), but **not** for member-name matching.)
- When a row falls through that check, it is appended to `unmatched` and never reaches `ledger[member_name][m]['paid']` or `['transactions']`. The dashboard's per-month "paid" cell stays unpaid, and because the modal's payment history is built from `data.months[m].transactions` ([`templates/adults.html:772-776`](templates/adults.html#L772-L776)), the row also disappears from the modal's history list.
- The new "Raw Payments" debug section ([`templates/adults.html:861`](templates/adults.html#L861)) uses `rawPaymentsByPerson[name]`. Its keys come from [`group_payments_by_person()` in `app.py:60-73`](app.py#L60-L73), which also stores the **literal** `Person` string (only `.strip()` and `[?]` stripped). So if the attendance-sheet name and the `Person` cell differ at the byte level, that section also returns an empty list — which is why the user does not see the row anywhere in the modal.
The most likely cause for "Mária Maco" specifically: the `Person` cell was typed (or pasted) without the `á` diacritic — `Maria Maco` vs `Mária Maco`. Other plausible variants the current code silently drops: case differences (`mária maco`), trailing/embedded extra whitespace, and NBSP characters.
The fix is to make the matching tolerant via the existing [`czech_utils.normalize()`](scripts/czech_utils.py#L22-L25) helper (NFKD + lowercase), with a small whitespace-collapse on top, and apply the same canonicalization in `group_payments_by_person()` so the modal's raw-payments lookup uses the canonical attendance-sheet name as the key.
## Approach
### 1. `scripts/match_payments.py` — tolerant `Person` → `ledger` resolution in `reconcile()`
- Add a small private helper at module scope:
```python
def _canonical_key(name: str) -> str:
return re.sub(r"\s+", " ", normalize(name)).strip()
```
Uses the existing `normalize()` from `czech_utils` ([:22-25](scripts/czech_utils.py#L22-L25)) and additionally collapses whitespace runs to a single space so `"Mária Maco"` and `"Mária Maco"` both reduce to `"maria maco"`.
- Inside [`reconcile()`](scripts/match_payments.py#L295), right after `member_names` is computed ([:308](scripts/match_payments.py#L308)), build a lookup dict once:
```python
canonical_by_key: dict[str, str] = {}
for name in member_names:
key = _canonical_key(name)
canonical_by_key.setdefault(key, name) # first wins; ambiguity handled below
```
- Replace the byte-exact check at [:404](scripts/match_payments.py#L404). Resolve each `member_name` from `matched_members` to the canonical attendance-sheet name before any ledger / credits access:
```python
for raw_member_name, confidence in matched_members:
member_name = canonical_by_key.get(_canonical_key(raw_member_name))
if member_name is None:
logger.warning(
"Payment matched to unknown member %r (tx: %s, %s) — adding to unmatched",
raw_member_name, tx.get("date", "?"), tx.get("message", "?"),
)
unmatched.append(tx)
continue
if member_name != raw_member_name:
logger.info(
"Person cell %r resolved to canonical member %r — consider fixing the sheet",
raw_member_name, member_name,
)
# ... rest of the loop body unchanged: ledger[member_name], credits[member_name], …
```
The `logger.info` line lets the user see (in `make web-debug` logs) which sheet rows have a non-canonical `Person` value, so they can clean them up at their own pace — without breaking allocation in the meantime.
- Leave the rest of the function untouched. Once `member_name` is the canonical name, every downstream key (`ledger[member_name]`, `credits[member_name]`, `other_ledger[member_name]`, the `tx["person"]` echo into `transactions`) is already correct.
### 2. `app.py` — canonicalize the raw-payments grouping key
- The current [`group_payments_by_person()`](app.py#L60-L73) cannot canonicalize on its own because it does not know the attendance-sheet member list. Extend its signature to accept the member list and reuse `_canonical_key`:
```python
from match_payments import _canonical_key # or re-export via a tiny public name
def group_payments_by_person(transactions, member_names=None):
canonical_by_key = (
{_canonical_key(n): n for n in member_names} if member_names else {}
)
grouped = {}
for tx in transactions:
person = str(tx.get("person", "")).strip()
if not person:
continue
for p in person.split(","):
p = re.sub(r"\[\?\]\s*", "", p).strip()
if not p:
continue
key = canonical_by_key.get(_canonical_key(p), p) # fallback: keep raw
grouped.setdefault(key, []).append(tx)
for rows in grouped.values():
rows.sort(key=lambda t: str(t.get("date", "")), reverse=True)
return grouped
```
- Update the three call sites to pass `member_names`:
- `adults_view()` around [`app.py:333`](app.py#L333) — `members` is already in scope; pass `[name for name, _, _ in members]`.
- `juniors_view()` around [`app.py:539`](app.py#L539) — same.
- `payments()` around [`app.py:549`](app.py#L549) — same; needs the adult+junior member names so the `/payments` per-person grouping is consistent.
- Naming: `_canonical_key` starts with an underscore inside `match_payments.py`. To avoid leaking a private symbol, expose it as `canonical_member_key` (no underscore) in `match_payments.py` and import that name from `app.py`.
### 3. Why not also touch `infer_payments.py`
`infer_payments.py` already writes canonical attendance-sheet names into the `Person` column (it picks from `member_names`). The bug only manifests when the cell was filled in **manually** by a human (typed without diacritics, different case) or was written by an older inference that has since drifted from a renamed attendance row. Making `reconcile()` tolerant fixes the symptom for both cases without changing inference. The `logger.info` line is sufficient signal for the user to clean up the sheet on their own schedule.
### 4. Tests
**4a. Delete obsolete route tests in [tests/test_app.py](tests/test_app.py).** Four tests target Flask routes that no longer exist (the old fee/reconcile pages were merged into `/adults` and `/juniors`); they currently fail with 404. Their coverage is already provided by `test_adults_route`, `test_juniors_route`, and `test_payments_route`. Delete:
- `test_fees_route` ([tests/test_app.py:22-35](tests/test_app.py#L22-L35)) — hits `/fees`
- `test_fees_juniors_route` ([tests/test_app.py:37-55](tests/test_app.py#L37-L55)) — hits `/fees-juniors`
- `test_reconcile_route` ([tests/test_app.py:57-81](tests/test_app.py#L57-L81)) — hits `/reconcile`; also asserts a literal `OK` string the merged dashboard no longer renders
- `test_reconcile_juniors_route` ([tests/test_app.py:101-131](tests/test_app.py#L101-L131)) — hits `/reconcile-juniors`; same `OK` assertion mismatch
The two tests that reference junior-only formatting (`? / 1 (J)` and `500 CZK / 4 (1A+3J)`) are testing a retired template, not the live `/juniors` page — no need to migrate those assertions; the live `/juniors` format is already covered by `test_juniors_route`.
**4b. Add `tests/test_match_payments.py`** (new file) covering the resolution helper and `reconcile()` end-to-end for the canonicalization fix:
- `_canonical_key("Mária Maco") == _canonical_key("maria maco")`
- `reconcile()` with member `"Mária Maco"` and a tx `{person: "Maria Maco", purpose: "2026-04", amount: 750, ...}` produces:
- `result['members']['Mária Maco']['months']['2026-04']['paid'] == 750`
- the tx appears in `result['members']['Mária Maco']['months']['2026-04']['transactions']`
- `result['unmatched']` is empty
- `reconcile()` with `Person = "Někdo Neznámý"` (no match in members) still routes to `unmatched`.
## Critical files
- [scripts/match_payments.py](scripts/match_payments.py) — add `canonical_member_key()` helper; build `canonical_by_key` once in `reconcile()`; resolve `raw_member_name` → `member_name` before ledger access at [:404](scripts/match_payments.py#L404).
- [app.py](app.py) — extend `group_payments_by_person()` to accept `member_names` and key the grouped dict by canonical attendance-sheet name; update three call sites.
- [tests/test_app.py](tests/test_app.py) — delete the four obsolete route tests listed in §4a.
- [tests/test_match_payments.py](tests/test_match_payments.py) — add the cases above (create the file if missing).
- [docs/plans/](docs/plans/) — per project [CLAUDE.md](CLAUDE.md), move this plan file to `docs/plans/2026-05-05-1640-payment-person-name-canonicalization.md` once execution starts (the plan-mode harness writes to `~/.claude/plans/` by default).
## Verification
1. **Reproduce first.** Before touching code, open `/adults`, click `[i]` next to "Mária Maco", and confirm both: 2026-04 is unpaid and the payment is missing from history. Inspect the actual `Person` cell value in the payments sheet for the 2026-04 row — confirm it differs from `"Mária Maco"` (likely missing the `á`). Record the exact string for the test case.
2. `make test` — new tests pass; existing tests still green.
3. `make web-debug` and reload `/adults`. The 2026-04 cell for "Mária Maco" turns green (`cell-ok`); the modal's payment history shows the row; the "Raw Payments" section also shows the row. Server log emits `Person cell 'Maria Maco' resolved to canonical member 'Mária Maco' — consider fixing the sheet`.
4. Cross-check `/payments` — the row appears under the `Mária Maco` group (canonical key), not under a separate `Maria Maco` group.
5. Spot-check one member with the conventionally-correct `Person` value (e.g. one of the recent payers visible on the dashboard) — paid cells and history are unchanged, no spurious resolution log line.
6. Confirm a payment with a genuinely unknown `Person` (typo of a non-member) still ends up in the dashboard's `Unmatched` block and emits the existing `Payment matched to unknown member …` warning.
7. Append a `CHANGELOG.md` entry per [CLAUDE.md](CLAUDE.md) once the user confirms the fix works.