Files
fuj-management/docs/plans/2026-05-06-2341-go-m4-io-layer.md
Jan Novak 6465e2a221
All checks were successful
Deploy to K8s / deploy (push) Successful in 11s
feat(go): IO layer behind interfaces (M4)
- io/attendance: CSV-over-public-URL client + Fake for adult/junior tabs
- io/drive: Drive v3 modifiedTime client + Fake
- io/sheets: Sheets v4 client (GetValues/AppendValues/BatchUpdateValues/
  WriteHeader/SortByDateColumn) + Fake with call-capture
- io/cache: Drive-modifiedTime-gated FileCache; two TTL knobs; atomic
  writes; generic Get[T]; Python-compatible JSON format; Flush()
- io/fio: Client interface backed by Fio REST API (apiClient) and HTML
  scraper (transparentClient); Fake; testdata fixtures
- membership/sources: NewSources wires attendance CSV + Sheets + cache
  into LoadAdults/LoadJuniors/LoadTransactions/LoadExceptions; Czech
  month parsing + merged-month maps
- banksync: SyncToSheets (SHA-256 dedup, optional sort) and
  InferPayments ([?] review prefix, dry-run) — tested with fakes
- cmd/fuj: sync and infer subcommands wired; fees and reconcile use
  real NewSources; go.mod gains google.golang.org/api + x/net
- gofumpt extra-rules applied across all packages; lint clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 01:05:59 +02:00

314 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Plan: Go rewrite — M4 IO layer behind interfaces
Companion to [2026-05-03-2349-go-backend-rewrite.md](2026-05-03-2349-go-backend-rewrite.md)
and [2026-05-03-2349-go-backend-rewrite-progress.md](2026-05-03-2349-go-backend-rewrite-progress.md).
## Context
M1M3 are merged: skeleton + tooling, every pure-domain function ported and
parity-tested against PII-scrubbed fixtures, and the `fuj fees` / `fuj
reconcile` subcommands wired but stubbed (`membership.NewStubSources()`
returns `ErrIOPending` for every loader). M4's job is to replace that stub
with real IO: read attendance CSVs, read the payments sheet + exceptions
tab, fetch Drive `modifiedTime` for cache gating, fetch Fio bank
transactions, and append/update rows on the payments sheet — all behind
narrow Go interfaces that have in-memory fakes for tests.
Once M4 lands, `fuj fees`, `fuj reconcile`, `fuj sync`, and `fuj infer` all
work end-to-end against the real Google Sheets and the real Fio account, and
M5 can start porting the JSON API on top of that IO.
User-confirmed scope choices for this milestone:
- **No live integration tests.** Fakes-only at unit level; live
verification deferred to manual smoke during M7.
- **Three PRs** (sheets/drive/cache → fio/sync → infer), one per major
area, each independently reviewable.
- **Attendance stays on CSV-via-public-URL** — matches Python, no extra
service-account grant needed.
## Approach
### Layering
```
internal/io/ ← raw, narrow clients (one per external system)
sheets/ ← typed wrapper around google.golang.org/api/sheets/v4
drive/ ← Drive v3, only ModifiedTime
attendance/ ← CSV-via-public-URL fetcher (no auth, no Sheets API)
fio/ ← FioClient interface + apiClient + transparentClient
cache/ ← FileCache: modifiedTime gate + two-TTL fallback + atomic write
internal/services/membership/ ← already exists; M4 adds adapters that satisfy
AttendanceLoader / TransactionLoader / ExceptionLoader
by composing io/sheets + io/drive + io/cache + io/attendance.
internal/services/banksync/ ← new: SyncToSheets (M4.7) + InferPayments (M4.8)
composing fio + sheets + attendance loaders.
```
The existing interfaces in [go/internal/services/membership/loader.go](../../go/internal/services/membership/loader.go)
(`AttendanceLoader`, `TransactionLoader`, `ExceptionLoader`, `Sources`) are
the seam — M4 adds a `NewSources(cfg config.Config) (Sources, error)`
constructor next to `NewStubSources()`, and `cmd/fuj/main.go` swaps the
stub for it.
### Auth — service-account only
Drop the OAuth+`token.pickle` path entirely (the production already uses a
service account; the fallback only existed because the original Python
script ran from a developer laptop). Sheets and Drive both authenticate via
`option.WithCredentialsFile(cfg.CredentialsPath)` plus
`option.WithScopes(...)`. Single shared `*http.Client` per backend with a
10s timeout (matches `DRIVE_TIMEOUT`).
### Cache shape
Match Python's wire format so the `tmp/*_cache.json` directory is shared
safely while both backends run side-by-side:
```json
{ "modifiedTime": "<RFC3339>", "data": <list|object>, "cachedAt": "<RFC3339>" }
```
Improvements over Python:
- Atomic write: marshal → `os.WriteFile(path+".tmp", ..., 0o600)`
`os.Rename`. Python's plain truncate-write stays as-is until M8.
- The two TTLs (`CacheTTL` and `CacheAPICheckTTL`) live in `config.Config`
already; only the `CacheDir` field is new.
The four cache keys mirror Python's `CACHE_SHEET_MAP`:
`attendance_regular`, `attendance_juniors`, `exceptions_dict`,
`payments_transactions` → maps to either `AttendanceSheetID` or
`PaymentsSheetID`.
When Drive fails, fall back to a synthetic key
`fmt.Sprintf("ttl-5m-%d", time.Now().Unix()/300)` so cache still keys
deterministically per 5-min bucket (same as Python).
### Fio: two impls behind one interface
```go
type Client interface {
FetchTransactions(ctx context.Context, from, to time.Time) ([]Transaction, error)
}
```
`apiClient` (when `cfg.FioAPIToken != ""`) hits
`https://fioapi.fio.cz/v1/rest/periods/{token}/{from}/{to}/transactions.json`,
unmarshals via a typed struct, and maps `column0..column22` to fields per
[scripts/fio_utils.py](../../scripts/fio_utils.py:90). Negative-amount rows
dropped (matches Python).
`transparentClient` (fallback) GETs
`https://ib.fio.cz/ib/transparent?a={accountNum}&f={DD.MM.YYYY}&t={DD.MM.YYYY}`
and walks the response with `golang.org/x/net/html` token visitor, counting
`<table class="table">` tags and grabbing rows from the **second** one
(skipping `<thead>`). `bank_id`, `currency`, `user_id`, `sender_account`
are empty (matches Python — known limitation).
`accountNum` is derived from `cfg.BankAccount` by stripping the IBAN prefix
(`CZ85 2010 0000 0028 0035 9168``2800359168`); add a small helper in
`config` for this since both the API URL and the transparent URL need it.
### Fakes
In-memory fakes live next to each real impl: `sheets/fake.go`,
`drive/fake.go`, `fio/fake.go`, `attendance/fake.go`,
`cache/fake.go` (a passthrough). All exported as `Fake` so tests do
`sheets.NewFake(rows)` and inject. The membership-adapter tests use these
fakes plus a couple of new raw-bytes fixtures under
`go/internal/io/<pkg>/testdata/`:
- `sheets/testdata/payments_minimal.json` — 2D-string array shaped like
`values.get` would return.
- `sheets/testdata/exceptions_minimal.json` — same, for the exceptions tab.
- `attendance/testdata/adults_minimal.csv` — small adult attendance CSV.
- `attendance/testdata/juniors_minimal.csv` — small junior CSV.
- `fio/testdata/api_response.json` — captured Fio API JSON shape.
- `fio/testdata/transparent.html` — captured transparent-page HTML.
Existing M3 domain fixtures under `go/tests/fixtures/` stay where they are
and continue to drive parity tests; they aren't reused for IO-layer tests
because they're at the wrong layer (post-parse domain types).
## Tasks (mapped to tracker)
Same 8 sub-milestones as the tracker, grouped into 3 PRs.
### PR 1 — sheets / drive / cache + membership wiring (M4.1, M4.2, M4.3, M4.6)
1. **Add deps** in [go/go.mod](../../go/go.mod):
`google.golang.org/api/{sheets/v4,drive/v3,option}`,
`golang.org/x/oauth2/google` (transitively pulled), `golang.org/x/net/html`.
2. **`internal/io/sheets/`**:
- `client.go``Client` struct holding `*sheets.Service`; methods
`GetValues(ctx, spreadsheetID, a1Range string) ([][]any, error)`,
`AppendValues(ctx, spreadsheetID, a1Range string, rows [][]any) error`,
`BatchUpdateValues(ctx, spreadsheetID, updates []ValueRange) error`,
`SortByColumn(ctx, spreadsheetID, sheetGID int64, columnIndex int) error`.
- `fake.go` — exported `Fake` with seedable `Values map[string][][]any`.
3. **`internal/io/drive/`**:
- `client.go``Client.ModifiedTime(ctx, fileID string) (string, error)`
using `drive.New(...).Files.Get(fileID).Fields("modifiedTime").SupportsAllDrives(true)`.
- `fake.go` with seedable `Times map[string]string`.
4. **`internal/io/attendance/`** (new — public-URL CSV):
- `client.go``Client.FetchAdults(ctx) ([][]string, error)` and
`FetchJuniors(ctx) ([][]string, error)` using `http.Get` on
`https://docs.google.com/spreadsheets/d/{ID}/export?format=csv&gid={GID}`,
decoded via `encoding/csv`.
- Add `AttendanceAdultSheetGID = "0"` constant in `internal/config`.
5. **`internal/io/cache/`**:
- `filecache.go``FileCache` with `Get(ctx, key string, fetch func(ctx) (any, error)) (any, error)`
wired through `Drive.ModifiedTime` and the two TTL knobs. Atomic write
via tmp-file + rename.
- Cache key → sheet ID map mirrors Python's `CACHE_SHEET_MAP`.
6. **`internal/services/membership/sources.go`** (new file in existing
package):
- `realSources struct { sheets *sheets.Client; drive *drive.Client; attendance *attendance.Client; cache *cache.FileCache }`.
- Constructor `NewSources(ctx, cfg) (Sources, error)` builds all clients.
- `LoadAdults` reads cached attendance CSV, runs through
`domain/fees.CalculateFee` + merged-month logic (port of
[scripts/attendance.py](../../scripts/attendance.py:170)
`get_members_with_fees`), returns `[]reconcile.Member`.
- `LoadTransactions` reads payments sheet rows via cache, parses to
`[]reconcile.Transaction` (port of
[match_payments.py:208](../../scripts/match_payments.py:208)
`fetch_sheet_data`).
- `LoadExceptions` reads `'exceptions'!A2:D` via cache, builds
`map[ExceptionKey]Exception` (port of `match_payments.py:266`).
7. **Add `LoadJuniors`** to the `AttendanceLoader` interface (Python infer
pulls both adult + junior member lists; needed for M4.8).
8. **Wire into [cmd/fuj/main.go](../../go/cmd/fuj/main.go)**: replace
`membership.NewStubSources()` in `feesCmd` and `reconcileCmd` with
`membership.NewSources(ctx, cfg)`.
9. **Tests** (default tag, no live IO):
- `sheets/client_test.go`, `drive/client_test.go`,
`cache/filecache_test.go` — exercise fakes + parsing logic with
testdata fixtures.
- `membership/sources_test.go` — adapter tests with sheets/drive/cache
fakes verify CSV→Member, rows→Transaction, exceptions tab → map.
10. **Config additions**: `CacheDir` (default `tmp` relative to `$PWD`,
overridable via `CACHE_DIR` env), `DriveTimeout` (default 10s).
11. **Manual verification**: `make go-build && go run ./cmd/fuj fees` and
`... reconcile` print real reports against the live sheet (with valid
`.secret/...credentials.json`).
12. CHANGELOG entry; tick M4.1, M4.2, M4.3, M4.6 in the progress tracker.
### PR 2 — fio + bank sync (M4.4, M4.5, M4.7)
1. **`internal/io/fio/`**:
- `client.go``Client` interface, `Transaction` struct.
- `api.go``apiClient` impl + URL builder + JSON struct definitions
for `accountStatement.transactionList.transaction[].column{N}.value`.
- `transparent.go``transparentClient` impl using
`golang.org/x/net/html` token visitor; helper functions
`parseCzechAmount` (NBSP/space strip + comma→dot) and
`parseCzechDate` (DD.MM.YYYY / DD/MM/YYYY).
- `fake.go`.
- `New(cfg) Client` chooses impl based on `cfg.FioAPIToken`.
- `accountNum(iban)` helper in `internal/config` strips IBAN prefix.
2. **`internal/services/banksync/sync.go`** (new package):
- `SyncToSheets(ctx, cfg, fio Client, sheets *sheets.Client, opts SyncOpts) (added int, err error)`.
- Reads existing rows via `sheets.GetValues(... "A1:K")`, validates
header against `COLUMN_LABELS`, writes header if missing, builds
`existingIDs` from column K (`Sync ID`).
- Computes date window: explicit `from`/`to` or `now - days*24h` (default 30d).
- For each fetched tx, computes `domain/synch.GenerateSyncID`, skips if
present, otherwise builds row in COLUMN_LABELS order with empty
manual/person/purpose/inferred slots.
- `sheets.AppendValues(... "A2", rows)`.
- Optional sort: `sheets.SortByColumn(... gid, 0)` — sheet GID resolved
once via `spreadsheets.Get`.
3. **Wire `fuj sync` subcommand** in `cmd/fuj/main.go`:
- Flags: `--days N` (default 30), `--from YYYY-MM-DD`, `--to YYYY-MM-DD`,
`--sort` (default true matching `make sync-2026`).
- Replace the M4-stub error path.
4. **Tests** (default tag): `banksync/sync_test.go` with fakes — verify
header insertion, dedup against existing sync IDs, multi-row append,
sort call.
5. **Manual verification**: dry-run sync against the real Fio account in a
throwaway test sheet; or visually verify `--from --to` window in stdout
with a no-write flag (only if cheap to add — otherwise skip per the
"no live integration tests" decision).
6. CHANGELOG entry; tick M4.4, M4.5, M4.7.
### PR 3 — infer (M4.8)
1. **`internal/services/banksync/infer.go`**:
- `InferPayments(ctx, cfg, sheets *sheets.Client, attendanceLoader, juniorLoader, opts InferOpts) (updated int, err error)`.
- Reads payments sheet `A1:Z` with case-insensitive header lookup.
- Required columns: `Person, Purpose, Inferred Amount`. Optional input:
`Date, Amount, Sender, Message, VS, manual fix`.
- Skip rule (matches [scripts/infer_payments.py:127](../../scripts/infer_payments.py:127)):
non-empty `manual fix` OR `Person` OR `Purpose` → leave row alone.
- Member list = union of `LoadAdults` + `LoadJuniors` deduped via
`domain/matching.CanonicalKey` (already exists from M2).
- For each empty row: build tx dict, call
`domain/matching.InferTransactionDetails`, prefix `[?] ` if
confidence == "review", emit a `ValueRange` update with R1C1 range
`R{i}C{personCol+1}:R{i}C{amountCol+1}`.
- Single `sheets.BatchUpdateValues` call for all updates.
2. **Wire `fuj infer` subcommand**: flags `--dry-run` (prints planned
updates, no API write).
3. **Tests** (default tag): `banksync/infer_test.go` — fixture rows,
verify skip rule, verify `[?]` prefix on review matches, verify
batchUpdate payload shape, verify `--dry-run` is no-op.
4. CHANGELOG entry; tick M4.8 → milestone gate ✅.
## Critical files
To modify:
- [go/internal/services/membership/loader.go](../../go/internal/services/membership/loader.go) — add `LoadJuniors` to `AttendanceLoader`, add `NewSources`.
- [go/cmd/fuj/main.go](../../go/cmd/fuj/main.go) — swap stub for real sources, add `sync`/`infer` subcommands.
- [go/internal/config/config.go](../../go/internal/config/config.go) — add `CacheDir`, `DriveTimeout`, `AttendanceAdultSheetGID` constant, IBAN→account-num helper.
- [go/go.mod](../../go/go.mod) / `go.sum` — google APIs + `x/net/html`.
- [docs/plans/2026-05-03-2349-go-backend-rewrite-progress.md](2026-05-03-2349-go-backend-rewrite-progress.md) — tick M4.x boxes after each PR.
- [CHANGELOG.md](../../CHANGELOG.md) — entry per PR.
To create:
- `go/internal/io/{sheets,drive,attendance,fio,cache}/{client,fake,*_test}.go`
- `go/internal/io/{sheets,attendance,fio}/testdata/*`
- `go/internal/services/membership/sources.go` (+ `sources_test.go`)
- `go/internal/services/banksync/{sync,infer}.go` (+ tests)
## Reused existing helpers
- `domain/fees.CalculateFee` / `CalculateJuniorFee` — fee math (M2.3, M2.4).
- `domain/matching.{BuildNameVariants,MatchMembers,InferTransactionDetails,FormatDate,CanonicalKey}` — match logic (M2.7M2.9).
- `domain/synch.GenerateSyncID` — dedup hash (M2.6).
- `domain/reconcile.{Member,Transaction,Exception,ExceptionKey}` — domain types.
- `domain/czech.{Normalize,ParseMonthReferences}` — used inside the
attendance/exceptions parsers.
- `domain/money.ParseCZK` — for parsing transparent-scrape amounts.
## Verification
End-to-end checks once all three PRs land:
1. `make go-build && make go-lint && make go-test` — clean.
2. `make go-parity` — M3 fixtures still pass (no domain regressions).
3. `./bin/fuj fees` — prints adult fee report matching Python `make fees`
(visual diff acceptable for now; byte-equality enforced in M5).
4. `./bin/fuj reconcile` — prints balance report comparable to
[scripts/match_payments.py](../../scripts/match_payments.py) `print_balance_report`.
5. `./bin/fuj sync --days 7` — appends new Fio rows to the payments sheet
(run with a real but recent date window; verify by counting added rows
and confirming no duplicates on a second run).
6. `./bin/fuj infer --dry-run` — prints planned Person/Purpose/Inferred
Amount updates without modifying the sheet. Then `./bin/fuj infer`
applies them; second run is a no-op (skip rule).
7. **Cache check**: delete `tmp/*_cache.json`, run `fuj fees`, verify file
appears with `modifiedTime` matching Drive. Re-run within 5 min;
verify no Drive call (debug log).
8. **Cross-process cache safety**: while `make web-py` is running, run
`fuj reconcile`; verify Python's cache file isn't corrupted and Go
reads the same data.
Gate (per tracker):
> `go test -tags=integration ./internal/io/...` round-trips against test sheet; default-tag tests run on fakes.
Per the user's scope decision, **the integration-test gate is downgraded
to "default-tag tests on fakes" only**. Live verification is deferred to
manual smoke during M7's parallel-run watch period. The progress tracker's
M4 gate line will be amended in PR 1.