Files
fuj-management/docs/by-claude-opus/scripts.md
Jan Novak 9b99f6d33b
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s
docs: experiment with generated documentation, let's keep it in git for
now
2026-03-11 11:57:30 +01:00

13 KiB
Raw Blame History

Scripts Reference

All scripts live in the scripts/ directory and are invoked via make targets or directly with Python.

Pipeline Scripts

These scripts form the core data processing pipeline. They are typically run in sequence:

sync_fio_to_sheets.py — Bank → Google Sheet

Syncs incoming Fio bank transactions to the Payments Google Sheet. Implements an append-only, deduplicated sync — re-running is always safe.

Usage:

make sync                # Last 30 days
make sync-2026           # Full year 2026 (Jan 1  Dec 31, sorted)

# Direct invocation with options:
python scripts/sync_fio_to_sheets.py \
  --credentials .secret/fuj-management-bot-credentials.json \
  --from 2026-01-01 --to 2026-03-01 \
  --sort-by-date

Arguments:

Argument Default Description
--days 30 Days to look back (ignored if --from/--to set)
--sheet-id Built-in ID Target Google Sheet
--credentials credentials.json Path to Google API credentials
--from (auto) Start date (YYYY-MM-DD)
--to (auto) End date (YYYY-MM-DD)
--sort-by-date false Sort the entire sheet by date after sync

How it works:

  1. Reads existing Sync IDs (column K) from the Google Sheet
  2. Fetches transactions from Fio bank (API or transparent page scraping)
  3. For each transaction, generates a SHA-256 hash: sha256(date|amount|currency|sender|vs|message|bank_id)
  4. Appends only transactions whose hash doesn't exist in the sheet
  5. Optionally sorts the sheet by date

Key functions:

Function Signature Description
get_sheets_service (credentials_path: str) → Resource Authenticates with Google Sheets API. Supports both service accounts and OAuth2 flows.
generate_sync_id (tx: dict) → str Creates the SHA-256 deduplication hash for a transaction.
sort_sheet_by_date (service, spreadsheet_id) Sorts all rows (excluding header) by the Date column.
sync_to_sheets (spreadsheet_id, credentials_path, ...) Main sync logic — read existing, fetch new, deduplicate, append.

Output example:

Connecting to Google Sheets using .secret/fuj-management-bot-credentials.json...
Reading existing sync IDs from sheet...
Fetching Fio transactions from 2026-02-01 to 2026-03-03...
Found 15 transactions.
Appending 3 new transactions to the sheet...
Sync completed successfully.
Sheet sorted by date.

infer_payments.py — Auto-Fill Person/Purpose

Scans the Payments Google Sheet for rows with empty Person/Purpose columns and uses name matching and Czech month parsing to fill them automatically.

Usage:

make infer

# Dry run (preview without writing):
python scripts/infer_payments.py \
  --credentials .secret/fuj-management-bot-credentials.json \
  --dry-run

Arguments:

Argument Default Description
--sheet-id Built-in ID Target Google Sheet
--credentials credentials.json Path to Google API credentials
--dry-run false Print inferences without writing to the sheet

How it works:

  1. Reads all rows from the Payments Google Sheet
  2. Fetches the member list from the Attendance Sheet
  3. For each row where Person AND Purpose are empty AND there's no "manual fix":
    • Combines sender name + message text
    • Attempts to match against member names (using name variants and diacritics normalization)
    • Parses Czech month references from the message
    • Writes inferred Person, Purpose, and Amount back to the sheet
  4. Low-confidence matches are prefixed with [?] for manual review

Skipping rules:

  • If manual fix column has any value → skip
  • If Person column already has a value → skip
  • If Purpose column already has a value → skip

Output example:

Connecting to Google Sheets...
Reading sheet data...
Fetching member list for matching...
Inffering details for empty rows...
Row 45: Inferred Jan Novák for 2026-02 (750 CZK)
Row 46: Inferred [?] František Vrbík for 2026-01, 2026-02 (1500 CZK)
Applying 2 updates to the sheet...
Update completed successfully.

match_payments.py — Reconciliation Engine + CLI Report

The core reconciliation engine. Matches payment transactions against expected fees and generates a detailed report. Also used as a library by app.py and infer_payments.py.

Usage:

make reconcile

# Direct invocation:
python scripts/match_payments.py \
  --credentials .secret/fuj-management-bot-credentials.json \
  --sheet-id YOUR_SHEET_ID

Arguments:

Argument Default Description
--sheet-id Built-in ID Payments Google Sheet
--credentials .secret/fuj-management-bot-credentials.json Google API credentials
--bank false Fetch directly from Fio bank instead of the Google Sheet

Key functions:

Function Description
_build_name_variants(name) Generates searchable name variants from a member name. E.g., "František Vrbík (Štrúdl)" → ["frantisek vrbik", "strudl", "vrbik", "frantisek"]
match_members(text, member_names) Finds members mentioned in text. Returns (name, confidence) tuples where confidence is auto or review.
infer_transaction_details(tx, member_names) Infers member(s) and month(s) for a single transaction.
format_date(val) Normalizes dates from Google Sheets (handles serial numbers and strings).
fetch_sheet_data(spreadsheet_id, credentials_path) Reads all rows from the Payments sheet as a list of dicts.
fetch_exceptions(spreadsheet_id, credentials_path) Reads fee overrides from the exceptions sheet tab.
reconcile(members, sorted_months, transactions, exceptions) Core engine: matches transactions to members/months, calculates balances.
print_report(result, sorted_months) Prints the CLI reconciliation report.

Name matching strategy:

The matching algorithm uses multiple tiers, in order of confidence:

Priority What it checks Confidence
1 Full name (normalized) found in text auto
2 Both first and last name present (any order) auto
3 Nickname from parentheses matches auto
4 Last name only (≥4 chars, not in common surname list) review
5 First name only (≥3 chars) review

Common surnames excluded from last-name-only matching: novak, novakova, prach

If any auto-confidence match exists, all review matches are discarded.

Payment allocation:

When a transaction matches multiple members and/or multiple months, the amount is split evenly across all allocations:

per_allocation = amount / (num_members × num_months)

CLI report sections:

  1. Summary table — Per-member, per-month grid: OK, UNPAID {amount}, {paid}/{expected}, balance
  2. Credits — Members with positive total balance
  3. Debts — Members with negative total balance
  4. Unmatched transactions — Payments that couldn't be assigned
  5. Matched transaction details — Full breakdown with [REVIEW] flags

calculate_fees.py — Fee Calculation

Calculates and prints monthly fees in a simple table format.

Usage:

make fees

Output example:

Member                  |        Jan 2026 |        Feb 2026
-------------------------------------------------------
Jan Novák               |    750 CZK (4)  |    200 CZK (1)
Alice Testová           |              -  |    750 CZK (3)
-------------------------------------------------------
TOTAL                   |       750 CZK   |       950 CZK

This is a simpler CLI version of the /fees web page. It only shows adults (tier A).


Shared Modules

attendance.py — Attendance Data & Fee Logic

Shared module that fetches attendance data from the Google Sheet and computes fees.

Constants:

Constant Value Description
SHEET_ID 1E2e_gT_K5AwSRCDLDTa2UetZTkHmBOcz0kFbBUNUNBA Attendance Google Sheet ID
FEE_FULL 750 Monthly fee for 2+ practices
FEE_SINGLE 200 Monthly fee for exactly 1 practice
COL_NAME 0 Column index for member name
COL_TIER 1 Column index for member tier
FIRST_DATE_COL 3 First column with date headers

Functions:

Function Signature Description
fetch_csv () → list[list[str]] Downloads the attendance sheet as CSV via its public export URL. No authentication needed.
parse_dates (header_row) → list[tuple[int, datetime]] Parses M/D/YYYY dates from the header row and returns (column_index, date) pairs.
group_by_month (dates) → dict[str, list[int]] Groups column indices by YYYY-MM month key.
calculate_fee (count: int) → int Applies fee rules: 0→0, 1→200, 2+→750 CZK.
get_members (rows) → list[tuple[str, str, list[str]]] Parses member rows. Stops at # last line sentinel. Skips comment rows (starting with #).
get_members_with_fees () → tuple[list, list[str]] Full pipeline: fetch → parse → compute. Returns (members, sorted_months) where each member is (name, tier, {month: (fee, count)}).

Member tier codes:

Tier Meaning Fees?
A Adult Yes (200 or 750 CZK)
J Junior No (separate sheet)
X Exempt No

fio_utils.py — Fio Bank Integration

Handles fetching transactions from Fio bank, supporting both API and HTML scraping modes.

Functions:

Function Description
fetch_transactions(date_from, date_to) Main entry point. Uses API if FIO_API_TOKEN is set, falls back to transparent page scraping.
fetch_transactions_api(token, date_from, date_to) Fetches via Fio REST API (JSON). Returns richer data including sender account and stable bank IDs.
fetch_transactions_transparent(date_from, date_to, account_id) Scrapes the public Fio transparent account HTML page.
parse_czech_amount(s) Parses Czech currency strings like "1 500,00 CZK" to float.
parse_czech_date(s) Parses DD.MM.YYYY or DD/MM/YYYY to YYYY-MM-DD.

FioTableParser — A custom HTMLParser subclass that extracts transaction rows from the second <table class="table"> on the Fio transparent page. Column mapping:

Index Column
0 Date (Datum)
1 Amount (Částka)
2 Type (Typ)
3 Sender name (Název protiúčtu)
4 Message (Zpráva pro příjemce)
5 KS (constant symbol)
6 VS (variable symbol)
7 SS (specific symbol)
8 Note (Poznámka)

Transaction dict format (returned by all fetch functions):

{
    "date": "2026-01-15",      # YYYY-MM-DD
    "amount": 750.0,           # Float, always positive (outgoing filtered)
    "sender": "Jan Novák",     # Sender name
    "message": "příspěvek",    # Message for recipient
    "vs": "12345",             # Variable symbol
    "ks": "",                  # Constant symbol
    "ss": "",                  # Specific symbol
    "bank_id": "abc123",       # Bank operation ID (API only)
    "user_id": "...",          # User identification (API only)
    "sender_account": "...",   # Sender account number (API only)
    "currency": "CZK"          # Currency (API only)
}

czech_utils.py — Czech Language Utilities

Text processing utilities for Czech language content, critical for matching payment messages.

normalize(text: str) → str

Strips diacritics and lowercases text using Unicode NFKD normalization:

  • "Štrúdl""strudl"
  • "František Vrbík""frantisek vrbik"
  • "LEDEN 2026""leden 2026"

parse_month_references(text: str, default_year=2026) → list[str]

Extracts YYYY-MM month references from Czech free text. Handles a remarkable variety of formats:

Input Output Pattern
"leden" ["2026-01"] Czech month name
"ledna" ["2026-01"] Czech month declension
"01/26" ["2026-01"] Numeric short year
"1/2026" ["2026-01"] Numeric full year
"11+12/2025" ["2025-11", "2025-12"] Multiple slash-separated
"12.2025" ["2025-12"] Dot notation
"listopad-leden" ["2025-11", "2025-12", "2026-01"] Range with year wrap
"říjen" ["2025-10"] Months ≥ October assumed previous year

CZECH_MONTHS — Dictionary mapping all Czech month name forms (nominative, genitive, locative) to month numbers. 35 entries covering all 12 months in multiple declensions.


Scripts reference generated from comprehensive code analysis on 2026-03-03.