Files
fuj-management/docs/by-claude-opus/architecture.md
Jan Novak 9b99f6d33b
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s
docs: experiment with generated documentation, let's keep it in git for
now
2026-03-11 11:57:30 +01:00

15 KiB

System Architecture

Overview

FUJ Management follows a pipeline architecture where data flows from external sources (Google Sheets, Fio Bank) through processing scripts into a web dashboard. There is no central database — Google Sheets serves as the persistent data store, and the Flask app renders views by fetching and processing data on every request.

Component Architecture

                  ┌─────────────────────────────────────────────────┐
                  │                EXTERNAL DATA SOURCES            │
                  │                                                 │
                  │  ┌──────────────────┐  ┌──────────────────────┐ │
                  │  │ Attendance Sheet  │  │   Fio Bank Account   │ │
                  │  │ (Google Sheets)   │  │                      │ │
                  │  │                   │  │  ┌────────────────┐  │ │
                  │  │ ID: 1E2e_gT...   │  │  │ REST API       │  │ │
                  │  │                   │  │  │ (JSON, w/token)│  │ │
                  │  │ CSV export (pub)  │  │  ├────────────────┤  │ │
                  │  │                   │  │  │ Transparent    │  │ │
                  │  └────────┬─────────┘  │  │ page (HTML)    │  │ │
                  │           │            │  └───────┬────────┘  │ │
                  │           │            └──────────┼──────────┘ │
                  └───────────┼───────────────────────┼────────────┘
                              │                       │
                 ─ ─ ─ ─ ─ ─ ┼ ─ ─ DATA INGESTION ─ ┼ ─ ─ ─ ─ ─
                              │                       │
                  ┌───────────▼──────┐    ┌───────────▼──────────┐
                  │  attendance.py   │    │    fio_utils.py       │
                  │                  │    │                       │
                  │ fetch_csv()      │    │ fetch_transactions()  │
                  │ parse_dates()    │    │ FioTableParser        │
                  │ group_by_month() │    │ parse_czech_amount()  │
                  │ calculate_fee()  │    │ parse_czech_date()    │
                  │ get_members()    │    │                       │
                  │ get_members_     │    │ API + HTML fallback   │
                  │   with_fees()    │    │                       │
                  └───────────┬──────┘    └───────────┬──────────┘
                              │                       │
                 ─ ─ ─ ─ ─ ─ ┼ ─ ─ PROCESSING ─ ─ ─ ┼ ─ ─ ─ ─ ─
                              │                       │
                              │         ┌─────────────▼──────────┐
                              │         │ sync_fio_to_sheets.py  │ ──▶ Payments Sheet
                              │         │                        │     (Google Sheets)
                              │         │ generate_sync_id()     │
                              │         │ sort_sheet_by_date()   │
                              │         │ get_sheets_service()   │
                              │         └────────────────────────┘
                              │                       │
                              │         ┌─────────────▼──────────┐
                              │         │   infer_payments.py    │ ──▶ Writes back to
                              │         │                        │     Payments Sheet
                              │         │ infer Person/Purpose/  │
                              │         │ Amount for empty rows  │
                              │         └────────────────────────┘
                              │                       │
                              │    ┌──────────────────▼──────────┐
                              │    │     czech_utils.py          │
                              │    │                             │
                              │    │ normalize()  — strip        │
                              │    │               diacritics    │
                              │    │ parse_month_references()    │
                              │    │ CZECH_MONTHS dict           │
                              │    └─────────────────────────────┘
                              │                       │
                 ─ ─ ─ ─ ─ ─ ┼ ─ RECONCILIATION  ─ ─┼ ─ ─ ─ ─ ─
                              │                       │
                    ┌─────────▼───────────────────────▼───────────┐
                    │           match_payments.py                  │
                    │                                              │
                    │  _build_name_variants()  — name matching    │
                    │  match_members()         — fuzzy match      │
                    │  infer_transaction_details()                │
                    │  fetch_sheet_data()      — read payments    │
                    │  fetch_exceptions()      — fee overrides    │
                    │  reconcile()            — CORE ENGINE       │
                    │  print_report()         — CLI output        │
                    └──────────────────────┬──────────────────────┘
                                           │
                 ─ ─ ─ ─ ─ ─ ─ PRESENTATION ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─
                                           │
                    ┌──────────────────────▼──────────────────────┐
                    │              app.py (Flask)                  │
                    │                                              │
                    │  GET /         → redirect to /fees          │
                    │  GET /fees     → fees.html                  │
                    │  GET /reconcile → reconcile.html            │
                    │  GET /payments → payments.html              │
                    │  GET /qr       → PNG QR code (SPD format)  │
                    └─────────────────────────────────────────────┘

Module Dependency Graph

app.py
  ├── attendance.py
  │     └── (stdlib: csv, urllib, datetime)
  └── match_payments.py
        ├── attendance.py
        ├── czech_utils.py
        │     └── (stdlib: re, unicodedata)
        └── sync_fio_to_sheets.py  (for get_sheets_service, DEFAULT_SPREADSHEET_ID)
              └── fio_utils.py
                    └── (stdlib: json, urllib, html.parser, datetime)

infer_payments.py
  ├── sync_fio_to_sheets.py
  ├── match_payments.py
  └── attendance.py

calculate_fees.py
  └── attendance.py

Import Relationships

Module Imports from
app.py attendance (get_members_with_fees, SHEET_ID), match_payments (reconcile, fetch_sheet_data, fetch_exceptions, normalize, DEFAULT_SPREADSHEET_ID)
match_payments.py attendance (get_members_with_fees), czech_utils (normalize, parse_month_references), sync_fio_to_sheets (get_sheets_service, DEFAULT_SPREADSHEET_ID)
infer_payments.py sync_fio_to_sheets (get_sheets_service, DEFAULT_SPREADSHEET_ID), match_payments (infer_transaction_details), attendance (get_members_with_fees)
sync_fio_to_sheets.py fio_utils (fetch_transactions)
calculate_fees.py attendance (get_members_with_fees)

Data Flow Patterns

Pattern 1: Sync & Enrich (Batch Pipeline)

This is the primary workflow for keeping the payments ledger up to date:

1. make sync                     2. make infer
   ┌──────┐    ┌──────────┐         ┌──────────┐    ┌──────────┐
   │ Fio  │───▶│ Payments │         │ Payments │───▶│ Payments │
   │ Bank │    │  Sheet   │         │  Sheet   │    │  Sheet   │
   └──────┘    │ (append) │         │ (read)   │    │ (update) │
               └──────────┘         └──────────┘    └──────────┘
   
   - Fetches last 30 days            - Reads empty Person/Purpose rows
   - SHA-256 dedup prevents            - Uses name matching + Czech month
     duplicate entries                   parsing to auto-fill
                                      - Marks uncertain matches with [?]

Pattern 2: Real-Time Rendering (Web Dashboard)

Every web request triggers a fresh data fetch — no caching layer exists:

Browser Request → Flask Route → Fetch (Google Sheets API/CSV) → Process → Render HTML
                                    │                              │
                                    │  attendance.py               │  reconcile()
                                    │  fetch_sheet_data()          │  or direct
                                    │  fetch_exceptions()          │  formatting
                                    ▼                              ▼
                              ~1-3 seconds                    Template with
                              (network I/O)                   inline CSS + JS

Pattern 3: QR Code Generation (On-Demand)

Browser clicks "Pay" → GET /qr?account=...&amount=...&message=... → SPD QR PNG
                                                                         │
                                                                    qrcode lib
                                                                    generates
                                                                    in-memory PNG

Key Design Patterns

1. Google Sheets as Database

Instead of a traditional database, the system uses two Google Sheets:

Sheet Purpose Access Method
Attendance Sheet (1E2e_gT...) Member names, tiers, practice dates, attendance marks Public CSV export (no auth needed)
Payments Sheet (1Om0YPo...) Bank transactions with Person/Purpose annotations Google Sheets API (service account auth)

Trade-offs:

  • Non-technical users can view and edit data directly
  • No database setup or maintenance
  • Built-in audit trail (Google Sheets version history)
  • Every page load incurs 1-3s of API latency
  • No complex queries or indexing
  • Rate limits on Google Sheets API

2. Dual-Mode Bank Access

fio_utils.py implements a transparent fallback pattern:

def fetch_transactions(date_from, date_to):
    token = os.environ.get("FIO_API_TOKEN", "").strip()
    if token:
        return fetch_transactions_api(token, date_from, date_to)    # Structured JSON
    return fetch_transactions_transparent(...)                       # HTML scraping

The API provides richer data (sender account numbers, stable bank IDs) but requires a token. The transparent page is always available but lacks some fields.

3. Name Matching with Confidence Levels

The reconciliation engine uses a multi-tier matching strategy:

Priority Method Confidence Example
1 Full name match auto "František Vrbík" in message
2 Both first + last name (any order) auto "Vrbík František"
3 Nickname match auto "(Štrúdl)" from member list
4 Last name only (≥4 chars, not common) review "Vrbík" alone
5 First name only (≥3 chars) review "František" alone

When both auto and review matches exist, review matches are discarded. This prevents false positives from generic first names.

4. Exception System

Fee overrides are managed through an exceptions sheet tab in the Payments Google Sheet:

Column Content
Name Member name
Period Month (YYYY-MM)
Amount Overridden fee in CZK
Note Reason for the exception

Exceptions are applied during reconciliation, replacing the attendance-calculated fee with the manually specified amount.

5. Render-Time Performance Tracking

Every page includes a performance breakdown:

@app.before_request
def start_timer():
    g.start_time = time.perf_counter()
    g.steps = []

def record_step(name):
    g.steps.append((name, time.perf_counter()))

The footer displays total render time and, on click, reveals a detailed breakdown (e.g., fetch_members:0.892s | fetch_payments:1.205s | reconcile:0.003s | render:0.015s).

Security Considerations

Concern Mitigation
PII in git .secret/ is gitignored; all data fetched at runtime
Google API credentials Service account JSON stored in .secret/, mounted as Docker secret
Bank API token Passed via FIO_API_TOKEN environment variable, never committed
Web app authentication None currently — the app has no auth layer
CSRF protection None currently — Flask default (no POST routes exist)

Scalability Notes

This system is purpose-built for a small club (~20-40 members). It makes deliberate trade-offs favoring simplicity over scale:

  • No caching: Every page load fetches live data from Google Sheets (1-3s latency). For a single-user admin dashboard, this is acceptable.
  • No background workers: Sync and inference are manual make commands, not scheduled jobs.
  • No database: Google Sheets handles 10s of members and 100s of transactions with ease.
  • Single-process Flask: The built-in development server runs directly in production (via Docker). For this use case, this is intentional — it's a personal tool, not a public service.

Architecture documentation generated from comprehensive code analysis on 2026-03-03.