docs: experiment with generated documentation, let's keep it in git for
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s

now
This commit is contained in:
2026-03-11 11:57:30 +01:00
parent e83d6af1f5
commit 9b99f6d33b
17 changed files with 2367 additions and 0 deletions

View File

@@ -0,0 +1,325 @@
# Scripts Reference
All scripts live in the `scripts/` directory and are invoked via `make` targets or directly with Python.
## Pipeline Scripts
These scripts form the core data processing pipeline. They are typically run in sequence:
### `sync_fio_to_sheets.py` — Bank → Google Sheet
Syncs incoming Fio bank transactions to the Payments Google Sheet. Implements an append-only, deduplicated sync — re-running is always safe.
**Usage**:
```bash
make sync # Last 30 days
make sync-2026 # Full year 2026 (Jan 1 Dec 31, sorted)
# Direct invocation with options:
python scripts/sync_fio_to_sheets.py \
--credentials .secret/fuj-management-bot-credentials.json \
--from 2026-01-01 --to 2026-03-01 \
--sort-by-date
```
**Arguments**:
| Argument | Default | Description |
|----------|---------|-------------|
| `--days` | `30` | Days to look back (ignored if `--from`/`--to` set) |
| `--sheet-id` | Built-in ID | Target Google Sheet |
| `--credentials` | `credentials.json` | Path to Google API credentials |
| `--from` | *(auto)* | Start date (YYYY-MM-DD) |
| `--to` | *(auto)* | End date (YYYY-MM-DD) |
| `--sort-by-date` | `false` | Sort the entire sheet by date after sync |
**How it works**:
1. Reads existing Sync IDs (column K) from the Google Sheet
2. Fetches transactions from Fio bank (API or transparent page scraping)
3. For each transaction, generates a SHA-256 hash: `sha256(date|amount|currency|sender|vs|message|bank_id)`
4. Appends only transactions whose hash doesn't exist in the sheet
5. Optionally sorts the sheet by date
**Key functions**:
| Function | Signature | Description |
|----------|-----------|-------------|
| `get_sheets_service` | `(credentials_path: str) → Resource` | Authenticates with Google Sheets API. Supports both service accounts and OAuth2 flows. |
| `generate_sync_id` | `(tx: dict) → str` | Creates the SHA-256 deduplication hash for a transaction. |
| `sort_sheet_by_date` | `(service, spreadsheet_id)` | Sorts all rows (excluding header) by the Date column. |
| `sync_to_sheets` | `(spreadsheet_id, credentials_path, ...)` | Main sync logic — read existing, fetch new, deduplicate, append. |
**Output example**:
```
Connecting to Google Sheets using .secret/fuj-management-bot-credentials.json...
Reading existing sync IDs from sheet...
Fetching Fio transactions from 2026-02-01 to 2026-03-03...
Found 15 transactions.
Appending 3 new transactions to the sheet...
Sync completed successfully.
Sheet sorted by date.
```
---
### `infer_payments.py` — Auto-Fill Person/Purpose
Scans the Payments Google Sheet for rows with empty Person/Purpose columns and uses name matching and Czech month parsing to fill them automatically.
**Usage**:
```bash
make infer
# Dry run (preview without writing):
python scripts/infer_payments.py \
--credentials .secret/fuj-management-bot-credentials.json \
--dry-run
```
**Arguments**:
| Argument | Default | Description |
|----------|---------|-------------|
| `--sheet-id` | Built-in ID | Target Google Sheet |
| `--credentials` | `credentials.json` | Path to Google API credentials |
| `--dry-run` | `false` | Print inferences without writing to the sheet |
**How it works**:
1. Reads all rows from the Payments Google Sheet
2. Fetches the member list from the Attendance Sheet
3. For each row where Person AND Purpose are empty AND there's no "manual fix":
- Combines sender name + message text
- Attempts to match against member names (using name variants and diacritics normalization)
- Parses Czech month references from the message
- Writes inferred Person, Purpose, and Amount back to the sheet
4. Low-confidence matches are prefixed with `[?]` for manual review
**Skipping rules**:
- If `manual fix` column has any value → skip
- If `Person` column already has a value → skip
- If `Purpose` column already has a value → skip
**Output example**:
```
Connecting to Google Sheets...
Reading sheet data...
Fetching member list for matching...
Inffering details for empty rows...
Row 45: Inferred Jan Novák for 2026-02 (750 CZK)
Row 46: Inferred [?] František Vrbík for 2026-01, 2026-02 (1500 CZK)
Applying 2 updates to the sheet...
Update completed successfully.
```
---
### `match_payments.py` — Reconciliation Engine + CLI Report
The core reconciliation engine. Matches payment transactions against expected fees and generates a detailed report. Also used as a library by `app.py` and `infer_payments.py`.
**Usage**:
```bash
make reconcile
# Direct invocation:
python scripts/match_payments.py \
--credentials .secret/fuj-management-bot-credentials.json \
--sheet-id YOUR_SHEET_ID
```
**Arguments**:
| Argument | Default | Description |
|----------|---------|-------------|
| `--sheet-id` | Built-in ID | Payments Google Sheet |
| `--credentials` | `.secret/fuj-management-bot-credentials.json` | Google API credentials |
| `--bank` | `false` | Fetch directly from Fio bank instead of the Google Sheet |
**Key functions**:
| Function | Description |
|----------|-------------|
| `_build_name_variants(name)` | Generates searchable name variants from a member name. E.g., "František Vrbík (Štrúdl)" → `["frantisek vrbik", "strudl", "vrbik", "frantisek"]` |
| `match_members(text, member_names)` | Finds members mentioned in text. Returns `(name, confidence)` tuples where confidence is `auto` or `review`. |
| `infer_transaction_details(tx, member_names)` | Infers member(s) and month(s) for a single transaction. |
| `format_date(val)` | Normalizes dates from Google Sheets (handles serial numbers and strings). |
| `fetch_sheet_data(spreadsheet_id, credentials_path)` | Reads all rows from the Payments sheet as a list of dicts. |
| `fetch_exceptions(spreadsheet_id, credentials_path)` | Reads fee overrides from the `exceptions` sheet tab. |
| `reconcile(members, sorted_months, transactions, exceptions)` | **Core engine**: matches transactions to members/months, calculates balances. |
| `print_report(result, sorted_months)` | Prints the CLI reconciliation report. |
**Name matching strategy**:
The matching algorithm uses multiple tiers, in order of confidence:
| Priority | What it checks | Confidence |
|----------|---------------|-----------|
| 1 | Full name (normalized) found in text | `auto` |
| 2 | Both first and last name present (any order) | `auto` |
| 3 | Nickname from parentheses matches | `auto` |
| 4 | Last name only (≥4 chars, not in common surname list) | `review` |
| 5 | First name only (≥3 chars) | `review` |
**Common surnames excluded from last-name-only matching**: `novak`, `novakova`, `prach`
If any `auto`-confidence match exists, all `review` matches are discarded.
**Payment allocation**:
When a transaction matches multiple members and/or multiple months, the amount is split **evenly** across all allocations:
```
per_allocation = amount / (num_members × num_months)
```
**CLI report sections**:
1. **Summary table** — Per-member, per-month grid: `OK`, `UNPAID {amount}`, `{paid}/{expected}`, balance
2. **Credits** — Members with positive total balance
3. **Debts** — Members with negative total balance
4. **Unmatched transactions** — Payments that couldn't be assigned
5. **Matched transaction details** — Full breakdown with `[REVIEW]` flags
---
### `calculate_fees.py` — Fee Calculation
Calculates and prints monthly fees in a simple table format.
**Usage**:
```bash
make fees
```
**Output example**:
```
Member | Jan 2026 | Feb 2026
-------------------------------------------------------
Jan Novák | 750 CZK (4) | 200 CZK (1)
Alice Testová | - | 750 CZK (3)
-------------------------------------------------------
TOTAL | 750 CZK | 950 CZK
```
This is a simpler CLI version of the `/fees` web page. It only shows adults (tier A).
---
## Shared Modules
### `attendance.py` — Attendance Data & Fee Logic
Shared module that fetches attendance data from the Google Sheet and computes fees.
**Constants**:
| Constant | Value | Description |
|----------|-------|-------------|
| `SHEET_ID` | `1E2e_gT_K5AwSRCDLDTa2UetZTkHmBOcz0kFbBUNUNBA` | Attendance Google Sheet ID |
| `FEE_FULL` | `750` | Monthly fee for 2+ practices |
| `FEE_SINGLE` | `200` | Monthly fee for exactly 1 practice |
| `COL_NAME` | `0` | Column index for member name |
| `COL_TIER` | `1` | Column index for member tier |
| `FIRST_DATE_COL` | `3` | First column with date headers |
**Functions**:
| Function | Signature | Description |
|----------|-----------|-------------|
| `fetch_csv` | `() → list[list[str]]` | Downloads the attendance sheet as CSV via its public export URL. No authentication needed. |
| `parse_dates` | `(header_row) → list[tuple[int, datetime]]` | Parses `M/D/YYYY` dates from the header row and returns `(column_index, date)` pairs. |
| `group_by_month` | `(dates) → dict[str, list[int]]` | Groups column indices by `YYYY-MM` month key. |
| `calculate_fee` | `(count: int) → int` | Applies fee rules: 0→0, 1→200, 2+→750 CZK. |
| `get_members` | `(rows) → list[tuple[str, str, list[str]]]` | Parses member rows. Stops at `# last line` sentinel. Skips comment rows (starting with `#`). |
| `get_members_with_fees` | `() → tuple[list, list[str]]` | Full pipeline: fetch → parse → compute. Returns `(members, sorted_months)` where each member is `(name, tier, {month: (fee, count)})`. |
**Member tier codes**:
| Tier | Meaning | Fees? |
|------|---------|-------|
| `A` | Adult | Yes (200 or 750 CZK) |
| `J` | Junior | No (separate sheet) |
| `X` | Exempt | No |
---
### `fio_utils.py` — Fio Bank Integration
Handles fetching transactions from Fio bank, supporting both API and HTML scraping modes.
**Functions**:
| Function | Description |
|----------|-------------|
| `fetch_transactions(date_from, date_to)` | Main entry point. Uses API if `FIO_API_TOKEN` is set, falls back to transparent page scraping. |
| `fetch_transactions_api(token, date_from, date_to)` | Fetches via Fio REST API (JSON). Returns richer data including sender account and stable bank IDs. |
| `fetch_transactions_transparent(date_from, date_to, account_id)` | Scrapes the public Fio transparent account HTML page. |
| `parse_czech_amount(s)` | Parses Czech currency strings like `"1 500,00 CZK"` to float. |
| `parse_czech_date(s)` | Parses `DD.MM.YYYY` or `DD/MM/YYYY` to `YYYY-MM-DD`. |
**FioTableParser** — A custom `HTMLParser` subclass that extracts transaction rows from the second `<table class="table">` on the Fio transparent page. Column mapping:
| Index | Column |
|-------|--------|
| 0 | Date (Datum) |
| 1 | Amount (Částka) |
| 2 | Type (Typ) |
| 3 | Sender name (Název protiúčtu) |
| 4 | Message (Zpráva pro příjemce) |
| 5 | KS (constant symbol) |
| 6 | VS (variable symbol) |
| 7 | SS (specific symbol) |
| 8 | Note (Poznámka) |
**Transaction dict format** (returned by all fetch functions):
```python
{
"date": "2026-01-15", # YYYY-MM-DD
"amount": 750.0, # Float, always positive (outgoing filtered)
"sender": "Jan Novák", # Sender name
"message": "příspěvek", # Message for recipient
"vs": "12345", # Variable symbol
"ks": "", # Constant symbol
"ss": "", # Specific symbol
"bank_id": "abc123", # Bank operation ID (API only)
"user_id": "...", # User identification (API only)
"sender_account": "...", # Sender account number (API only)
"currency": "CZK" # Currency (API only)
}
```
---
### `czech_utils.py` — Czech Language Utilities
Text processing utilities for Czech language content, critical for matching payment messages.
**`normalize(text: str) → str`**
Strips diacritics and lowercases text using Unicode NFKD normalization:
- `"Štrúdl"``"strudl"`
- `"František Vrbík"``"frantisek vrbik"`
- `"LEDEN 2026"``"leden 2026"`
**`parse_month_references(text: str, default_year=2026) → list[str]`**
Extracts YYYY-MM month references from Czech free text. Handles a remarkable variety of formats:
| Input | Output | Pattern |
|-------|--------|---------|
| `"leden"` | `["2026-01"]` | Czech month name |
| `"ledna"` | `["2026-01"]` | Czech month declension |
| `"01/26"` | `["2026-01"]` | Numeric short year |
| `"1/2026"` | `["2026-01"]` | Numeric full year |
| `"11+12/2025"` | `["2025-11", "2025-12"]` | Multiple slash-separated |
| `"12.2025"` | `["2025-12"]` | Dot notation |
| `"listopad-leden"` | `["2025-11", "2025-12", "2026-01"]` | Range with year wrap |
| `"říjen"` | `["2025-10"]` | Months ≥ October assumed previous year |
**`CZECH_MONTHS`** — Dictionary mapping all Czech month name forms (nominative, genitive, locative) to month numbers. 35 entries covering all 12 months in multiple declensions.
---
*Scripts reference generated from comprehensive code analysis on 2026-03-03.*