docs: experiment with generated documentation, let's keep it in git for
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s
now
This commit is contained in:
325
docs/by-claude-opus/scripts.md
Normal file
325
docs/by-claude-opus/scripts.md
Normal file
@@ -0,0 +1,325 @@
|
||||
# Scripts Reference
|
||||
|
||||
All scripts live in the `scripts/` directory and are invoked via `make` targets or directly with Python.
|
||||
|
||||
## Pipeline Scripts
|
||||
|
||||
These scripts form the core data processing pipeline. They are typically run in sequence:
|
||||
|
||||
### `sync_fio_to_sheets.py` — Bank → Google Sheet
|
||||
|
||||
Syncs incoming Fio bank transactions to the Payments Google Sheet. Implements an append-only, deduplicated sync — re-running is always safe.
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
make sync # Last 30 days
|
||||
make sync-2026 # Full year 2026 (Jan 1 – Dec 31, sorted)
|
||||
|
||||
# Direct invocation with options:
|
||||
python scripts/sync_fio_to_sheets.py \
|
||||
--credentials .secret/fuj-management-bot-credentials.json \
|
||||
--from 2026-01-01 --to 2026-03-01 \
|
||||
--sort-by-date
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
| Argument | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `--days` | `30` | Days to look back (ignored if `--from`/`--to` set) |
|
||||
| `--sheet-id` | Built-in ID | Target Google Sheet |
|
||||
| `--credentials` | `credentials.json` | Path to Google API credentials |
|
||||
| `--from` | *(auto)* | Start date (YYYY-MM-DD) |
|
||||
| `--to` | *(auto)* | End date (YYYY-MM-DD) |
|
||||
| `--sort-by-date` | `false` | Sort the entire sheet by date after sync |
|
||||
|
||||
**How it works**:
|
||||
|
||||
1. Reads existing Sync IDs (column K) from the Google Sheet
|
||||
2. Fetches transactions from Fio bank (API or transparent page scraping)
|
||||
3. For each transaction, generates a SHA-256 hash: `sha256(date|amount|currency|sender|vs|message|bank_id)`
|
||||
4. Appends only transactions whose hash doesn't exist in the sheet
|
||||
5. Optionally sorts the sheet by date
|
||||
|
||||
**Key functions**:
|
||||
|
||||
| Function | Signature | Description |
|
||||
|----------|-----------|-------------|
|
||||
| `get_sheets_service` | `(credentials_path: str) → Resource` | Authenticates with Google Sheets API. Supports both service accounts and OAuth2 flows. |
|
||||
| `generate_sync_id` | `(tx: dict) → str` | Creates the SHA-256 deduplication hash for a transaction. |
|
||||
| `sort_sheet_by_date` | `(service, spreadsheet_id)` | Sorts all rows (excluding header) by the Date column. |
|
||||
| `sync_to_sheets` | `(spreadsheet_id, credentials_path, ...)` | Main sync logic — read existing, fetch new, deduplicate, append. |
|
||||
|
||||
**Output example**:
|
||||
```
|
||||
Connecting to Google Sheets using .secret/fuj-management-bot-credentials.json...
|
||||
Reading existing sync IDs from sheet...
|
||||
Fetching Fio transactions from 2026-02-01 to 2026-03-03...
|
||||
Found 15 transactions.
|
||||
Appending 3 new transactions to the sheet...
|
||||
Sync completed successfully.
|
||||
Sheet sorted by date.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `infer_payments.py` — Auto-Fill Person/Purpose
|
||||
|
||||
Scans the Payments Google Sheet for rows with empty Person/Purpose columns and uses name matching and Czech month parsing to fill them automatically.
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
make infer
|
||||
|
||||
# Dry run (preview without writing):
|
||||
python scripts/infer_payments.py \
|
||||
--credentials .secret/fuj-management-bot-credentials.json \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
| Argument | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `--sheet-id` | Built-in ID | Target Google Sheet |
|
||||
| `--credentials` | `credentials.json` | Path to Google API credentials |
|
||||
| `--dry-run` | `false` | Print inferences without writing to the sheet |
|
||||
|
||||
**How it works**:
|
||||
|
||||
1. Reads all rows from the Payments Google Sheet
|
||||
2. Fetches the member list from the Attendance Sheet
|
||||
3. For each row where Person AND Purpose are empty AND there's no "manual fix":
|
||||
- Combines sender name + message text
|
||||
- Attempts to match against member names (using name variants and diacritics normalization)
|
||||
- Parses Czech month references from the message
|
||||
- Writes inferred Person, Purpose, and Amount back to the sheet
|
||||
4. Low-confidence matches are prefixed with `[?]` for manual review
|
||||
|
||||
**Skipping rules**:
|
||||
- If `manual fix` column has any value → skip
|
||||
- If `Person` column already has a value → skip
|
||||
- If `Purpose` column already has a value → skip
|
||||
|
||||
**Output example**:
|
||||
```
|
||||
Connecting to Google Sheets...
|
||||
Reading sheet data...
|
||||
Fetching member list for matching...
|
||||
Inffering details for empty rows...
|
||||
Row 45: Inferred Jan Novák for 2026-02 (750 CZK)
|
||||
Row 46: Inferred [?] František Vrbík for 2026-01, 2026-02 (1500 CZK)
|
||||
Applying 2 updates to the sheet...
|
||||
Update completed successfully.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `match_payments.py` — Reconciliation Engine + CLI Report
|
||||
|
||||
The core reconciliation engine. Matches payment transactions against expected fees and generates a detailed report. Also used as a library by `app.py` and `infer_payments.py`.
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
make reconcile
|
||||
|
||||
# Direct invocation:
|
||||
python scripts/match_payments.py \
|
||||
--credentials .secret/fuj-management-bot-credentials.json \
|
||||
--sheet-id YOUR_SHEET_ID
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
| Argument | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `--sheet-id` | Built-in ID | Payments Google Sheet |
|
||||
| `--credentials` | `.secret/fuj-management-bot-credentials.json` | Google API credentials |
|
||||
| `--bank` | `false` | Fetch directly from Fio bank instead of the Google Sheet |
|
||||
|
||||
**Key functions**:
|
||||
|
||||
| Function | Description |
|
||||
|----------|-------------|
|
||||
| `_build_name_variants(name)` | Generates searchable name variants from a member name. E.g., "František Vrbík (Štrúdl)" → `["frantisek vrbik", "strudl", "vrbik", "frantisek"]` |
|
||||
| `match_members(text, member_names)` | Finds members mentioned in text. Returns `(name, confidence)` tuples where confidence is `auto` or `review`. |
|
||||
| `infer_transaction_details(tx, member_names)` | Infers member(s) and month(s) for a single transaction. |
|
||||
| `format_date(val)` | Normalizes dates from Google Sheets (handles serial numbers and strings). |
|
||||
| `fetch_sheet_data(spreadsheet_id, credentials_path)` | Reads all rows from the Payments sheet as a list of dicts. |
|
||||
| `fetch_exceptions(spreadsheet_id, credentials_path)` | Reads fee overrides from the `exceptions` sheet tab. |
|
||||
| `reconcile(members, sorted_months, transactions, exceptions)` | **Core engine**: matches transactions to members/months, calculates balances. |
|
||||
| `print_report(result, sorted_months)` | Prints the CLI reconciliation report. |
|
||||
|
||||
**Name matching strategy**:
|
||||
|
||||
The matching algorithm uses multiple tiers, in order of confidence:
|
||||
|
||||
| Priority | What it checks | Confidence |
|
||||
|----------|---------------|-----------|
|
||||
| 1 | Full name (normalized) found in text | `auto` |
|
||||
| 2 | Both first and last name present (any order) | `auto` |
|
||||
| 3 | Nickname from parentheses matches | `auto` |
|
||||
| 4 | Last name only (≥4 chars, not in common surname list) | `review` |
|
||||
| 5 | First name only (≥3 chars) | `review` |
|
||||
|
||||
**Common surnames excluded from last-name-only matching**: `novak`, `novakova`, `prach`
|
||||
|
||||
If any `auto`-confidence match exists, all `review` matches are discarded.
|
||||
|
||||
**Payment allocation**:
|
||||
|
||||
When a transaction matches multiple members and/or multiple months, the amount is split **evenly** across all allocations:
|
||||
```
|
||||
per_allocation = amount / (num_members × num_months)
|
||||
```
|
||||
|
||||
**CLI report sections**:
|
||||
|
||||
1. **Summary table** — Per-member, per-month grid: `OK`, `UNPAID {amount}`, `{paid}/{expected}`, balance
|
||||
2. **Credits** — Members with positive total balance
|
||||
3. **Debts** — Members with negative total balance
|
||||
4. **Unmatched transactions** — Payments that couldn't be assigned
|
||||
5. **Matched transaction details** — Full breakdown with `[REVIEW]` flags
|
||||
|
||||
---
|
||||
|
||||
### `calculate_fees.py` — Fee Calculation
|
||||
|
||||
Calculates and prints monthly fees in a simple table format.
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
make fees
|
||||
```
|
||||
|
||||
**Output example**:
|
||||
```
|
||||
Member | Jan 2026 | Feb 2026
|
||||
-------------------------------------------------------
|
||||
Jan Novák | 750 CZK (4) | 200 CZK (1)
|
||||
Alice Testová | - | 750 CZK (3)
|
||||
-------------------------------------------------------
|
||||
TOTAL | 750 CZK | 950 CZK
|
||||
```
|
||||
|
||||
This is a simpler CLI version of the `/fees` web page. It only shows adults (tier A).
|
||||
|
||||
---
|
||||
|
||||
## Shared Modules
|
||||
|
||||
### `attendance.py` — Attendance Data & Fee Logic
|
||||
|
||||
Shared module that fetches attendance data from the Google Sheet and computes fees.
|
||||
|
||||
**Constants**:
|
||||
|
||||
| Constant | Value | Description |
|
||||
|----------|-------|-------------|
|
||||
| `SHEET_ID` | `1E2e_gT_K5AwSRCDLDTa2UetZTkHmBOcz0kFbBUNUNBA` | Attendance Google Sheet ID |
|
||||
| `FEE_FULL` | `750` | Monthly fee for 2+ practices |
|
||||
| `FEE_SINGLE` | `200` | Monthly fee for exactly 1 practice |
|
||||
| `COL_NAME` | `0` | Column index for member name |
|
||||
| `COL_TIER` | `1` | Column index for member tier |
|
||||
| `FIRST_DATE_COL` | `3` | First column with date headers |
|
||||
|
||||
**Functions**:
|
||||
|
||||
| Function | Signature | Description |
|
||||
|----------|-----------|-------------|
|
||||
| `fetch_csv` | `() → list[list[str]]` | Downloads the attendance sheet as CSV via its public export URL. No authentication needed. |
|
||||
| `parse_dates` | `(header_row) → list[tuple[int, datetime]]` | Parses `M/D/YYYY` dates from the header row and returns `(column_index, date)` pairs. |
|
||||
| `group_by_month` | `(dates) → dict[str, list[int]]` | Groups column indices by `YYYY-MM` month key. |
|
||||
| `calculate_fee` | `(count: int) → int` | Applies fee rules: 0→0, 1→200, 2+→750 CZK. |
|
||||
| `get_members` | `(rows) → list[tuple[str, str, list[str]]]` | Parses member rows. Stops at `# last line` sentinel. Skips comment rows (starting with `#`). |
|
||||
| `get_members_with_fees` | `() → tuple[list, list[str]]` | Full pipeline: fetch → parse → compute. Returns `(members, sorted_months)` where each member is `(name, tier, {month: (fee, count)})`. |
|
||||
|
||||
**Member tier codes**:
|
||||
|
||||
| Tier | Meaning | Fees? |
|
||||
|------|---------|-------|
|
||||
| `A` | Adult | Yes (200 or 750 CZK) |
|
||||
| `J` | Junior | No (separate sheet) |
|
||||
| `X` | Exempt | No |
|
||||
|
||||
---
|
||||
|
||||
### `fio_utils.py` — Fio Bank Integration
|
||||
|
||||
Handles fetching transactions from Fio bank, supporting both API and HTML scraping modes.
|
||||
|
||||
**Functions**:
|
||||
|
||||
| Function | Description |
|
||||
|----------|-------------|
|
||||
| `fetch_transactions(date_from, date_to)` | Main entry point. Uses API if `FIO_API_TOKEN` is set, falls back to transparent page scraping. |
|
||||
| `fetch_transactions_api(token, date_from, date_to)` | Fetches via Fio REST API (JSON). Returns richer data including sender account and stable bank IDs. |
|
||||
| `fetch_transactions_transparent(date_from, date_to, account_id)` | Scrapes the public Fio transparent account HTML page. |
|
||||
| `parse_czech_amount(s)` | Parses Czech currency strings like `"1 500,00 CZK"` to float. |
|
||||
| `parse_czech_date(s)` | Parses `DD.MM.YYYY` or `DD/MM/YYYY` to `YYYY-MM-DD`. |
|
||||
|
||||
**FioTableParser** — A custom `HTMLParser` subclass that extracts transaction rows from the second `<table class="table">` on the Fio transparent page. Column mapping:
|
||||
|
||||
| Index | Column |
|
||||
|-------|--------|
|
||||
| 0 | Date (Datum) |
|
||||
| 1 | Amount (Částka) |
|
||||
| 2 | Type (Typ) |
|
||||
| 3 | Sender name (Název protiúčtu) |
|
||||
| 4 | Message (Zpráva pro příjemce) |
|
||||
| 5 | KS (constant symbol) |
|
||||
| 6 | VS (variable symbol) |
|
||||
| 7 | SS (specific symbol) |
|
||||
| 8 | Note (Poznámka) |
|
||||
|
||||
**Transaction dict format** (returned by all fetch functions):
|
||||
|
||||
```python
|
||||
{
|
||||
"date": "2026-01-15", # YYYY-MM-DD
|
||||
"amount": 750.0, # Float, always positive (outgoing filtered)
|
||||
"sender": "Jan Novák", # Sender name
|
||||
"message": "příspěvek", # Message for recipient
|
||||
"vs": "12345", # Variable symbol
|
||||
"ks": "", # Constant symbol
|
||||
"ss": "", # Specific symbol
|
||||
"bank_id": "abc123", # Bank operation ID (API only)
|
||||
"user_id": "...", # User identification (API only)
|
||||
"sender_account": "...", # Sender account number (API only)
|
||||
"currency": "CZK" # Currency (API only)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `czech_utils.py` — Czech Language Utilities
|
||||
|
||||
Text processing utilities for Czech language content, critical for matching payment messages.
|
||||
|
||||
**`normalize(text: str) → str`**
|
||||
|
||||
Strips diacritics and lowercases text using Unicode NFKD normalization:
|
||||
- `"Štrúdl"` → `"strudl"`
|
||||
- `"František Vrbík"` → `"frantisek vrbik"`
|
||||
- `"LEDEN 2026"` → `"leden 2026"`
|
||||
|
||||
**`parse_month_references(text: str, default_year=2026) → list[str]`**
|
||||
|
||||
Extracts YYYY-MM month references from Czech free text. Handles a remarkable variety of formats:
|
||||
|
||||
| Input | Output | Pattern |
|
||||
|-------|--------|---------|
|
||||
| `"leden"` | `["2026-01"]` | Czech month name |
|
||||
| `"ledna"` | `["2026-01"]` | Czech month declension |
|
||||
| `"01/26"` | `["2026-01"]` | Numeric short year |
|
||||
| `"1/2026"` | `["2026-01"]` | Numeric full year |
|
||||
| `"11+12/2025"` | `["2025-11", "2025-12"]` | Multiple slash-separated |
|
||||
| `"12.2025"` | `["2025-12"]` | Dot notation |
|
||||
| `"listopad-leden"` | `["2025-11", "2025-12", "2026-01"]` | Range with year wrap |
|
||||
| `"říjen"` | `["2025-10"]` | Months ≥ October assumed previous year |
|
||||
|
||||
**`CZECH_MONTHS`** — Dictionary mapping all Czech month name forms (nominative, genitive, locative) to month numbers. 35 entries covering all 12 months in multiple declensions.
|
||||
|
||||
---
|
||||
|
||||
*Scripts reference generated from comprehensive code analysis on 2026-03-03.*
|
||||
Reference in New Issue
Block a user