Adds internal/domain/czech.Normalize, the first pure-domain function in the Go rewrite (M2 milestone). Matches Python czech_utils.normalize byte- for-byte: NFKD decompose via golang.org/x/text/unicode/norm, drop Mn- category combining marks (unicode.Mn, not IsMark, to match Python's unicodedata.combining() semantics), then strings.ToLower. Includes 13-case table-driven test; all inputs spot-checked against the Python implementation before locking. Adds golang.org/x/text v0.36.0 as first external dependency. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5.9 KiB
Plan: Go rewrite — M2.1 domain/czech.Normalize
Context
The Go rewrite finished M1 (skeleton, tooling, hello server) in commit
cf0f176 on 2026-05-04. The next milestone, M2 — Pure-domain helpers,
is current per progress tracker
but has no work landed yet (all 12 sub-tasks unchecked).
This plan covers only the first M2 task: porting Python's
normalize from scripts/czech_utils.py
to Go as internal/domain/czech.Normalize. It is the lowest-level helper
in the domain — parse_month_references, _build_name_variants,
match_members, exception keys, and reconcile all transitively depend
on it. Getting it byte-equivalent first removes a class of "why does my
match not fire" failures from every later M2 task.
Decision (confirmed in plan-mode Q): start with hand-written Go unit
tests for fresh Czech edge cases. Defer parity-fixture wiring until
M3.1/M3.2 land (separate task); add the parity test for Normalize
retroactively at that point.
Scope
- New package
go/internal/domain/czech/withNormalizeand unit tests. - Add
golang.org/x/textdependency togo/go.mod(currently zero deps). - Out of scope:
ParseMonthReferences(M2.2), fixture tooling (M3.1/M3.2), CLI subcommand wiring (M2.11/M2.12), parity test runner.
Recommended approach
Python contract to match
def normalize(text: str) -> str:
nfkd = unicodedata.normalize("NFKD", text)
return "".join(c for c in nfkd if not unicodedata.combining(c)).lower()
Three semantic operations:
- NFKD decompose
- Drop characters where
unicodedata.combining(c)is non-zero - Lowercase
Go implementation
go/internal/domain/czech/normalize.go:
package czech
import (
"strings"
"unicode"
"golang.org/x/text/unicode/norm"
)
func Normalize(s string) string {
decomposed := norm.NFKD.String(s)
var b strings.Builder
b.Grow(len(decomposed))
for _, r := range decomposed {
if unicode.In(r, unicode.Mn) {
continue
}
b.WriteRune(r)
}
return strings.ToLower(b.String())
}
Two precision points worth flagging:
unicode.Mnnotunicode.IsMark. The plan's library-choices table mentionsunicode.IsMark, but that covers Mn + Mc + Me. Pythonunicodedata.combining()returns 0 for Mc/Me (their canonical combining class is 0), so it effectively filters only Mn. Useunicode.In(r, unicode.Mn)for byte-equivalence with Python. Cite this in a one-line code comment; it's the kind of thing a future reader will second-guess.strings.ToLowervs Go's locale-aware tools. Python's.lower()on already-decomposed Latin is straight ASCII lowercase for Czech. Stdlibstrings.ToLowermatches; do not pull ingolang.org/x/text/cases.
Tests
go/internal/domain/czech/normalize_test.go — table-driven, covers:
- ASCII passthrough:
"Honza" → "honza" - Czech lowercase diacritics:
"žluťoučký" → "zlutoucky" - Mixed case + diacritics:
"Příliš" → "prilis" - Czech caron + ring:
"Dvořák" → "dvorak","Růžena" → "ruzena" - Hard letters:
"Čeněk" → "cenek","Kačer" → "kacer" - Empty string:
"" → "" - Already-normalized:
"prilis" → "prilis"(idempotence) - Pre-composed vs decomposed input both produce the same output (NFC
"é"and"é"both →"e") - Whitespace preserved:
"Jan Novák" → "jan novak"
Run a one-shot cross-check against the live Python implementation for each test input before locking the table:
PYTHONPATH=scripts:. python -c \
'from czech_utils import normalize; print(repr(normalize("Dvořák")))'
This is the manual stand-in for the M3 parity fixtures.
Wire-up
go get golang.org/x/text@latest(run fromgo/);go mod tidy.- No CLI changes —
cmd/fujalready stubsfees/reconcilewith exit code 2; no need to touch dispatcher for this task.Normalizeis consumed by other domain code, not by users directly.
Critical files
- New: go/internal/domain/czech/normalize.go
- New: go/internal/domain/czech/normalize_test.go
- Modified: go/go.mod,
go/go.sum(new) - Reference (read-only): scripts/czech_utils.py — the porting source
- Reference (read-only): docs/plans/2026-05-03-2349-go-backend-rewrite.md — risk #3 (NFKD edge cases)
Verification
End-to-end checks before marking M2.1 done:
cd go && go build ./...— clean compile.cd go && go test ./internal/domain/czech/...— all table cases green.cd go && go test -race ./...— race-clean.cd go && golangci-lint run(ormake go-lintfrom repo root) — clean.- Spot parity (manual, will be automated in M3): for each Go test
input, run the Python
normalizeviaPYTHONPATH=scripts:. python -c '...'and confirm bytes match. Capture the diff in the commit message if anything surprises. make go-build && make go-test && make go-lintfrom repo root — proves the existing M1 gate still passes.
Branching & follow-up
Per CLAUDE.md, this is feature work → branch + Gitea MR:
- Branch:
feat/m2-1-czech-normalizeoffmain. - Single commit, Co-Authored-By trailer.
- Push with
-u, print compare URLhttps://gitea.home.hrajfrisbee.cz/kacerr/fuj-management/compare/main...feat/m2-1-czech-normalize - User opens/merges the MR.
- After merge: tick
M2.1in the progress tracker with the commit SHA; add a one-line CHANGELOG entry; record any porting surprise in the tracker's "Notes & decisions" section (e.g. theMn-vs-IsMarkprecision point if it bears noting).
Next task after this lands is M2.2 ParseMonthReferences — the
larger, edge-case-heavier sibling. Whether to start it before or after
M3.1/M3.2 is a separate decision the user can make then.