feat(go/M2.1): port czech.Normalize — NFKD + Mn strip + lowercase
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s

Adds internal/domain/czech.Normalize, the first pure-domain function in
the Go rewrite (M2 milestone). Matches Python czech_utils.normalize byte-
for-byte: NFKD decompose via golang.org/x/text/unicode/norm, drop Mn-
category combining marks (unicode.Mn, not IsMark, to match Python's
unicodedata.combining() semantics), then strings.ToLower.

Includes 13-case table-driven test; all inputs spot-checked against the
Python implementation before locking. Adds golang.org/x/text v0.36.0 as
first external dependency.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-05 22:23:40 +02:00
parent 91ac3b37cf
commit d9a61b338c
5 changed files with 215 additions and 0 deletions

View File

@@ -0,0 +1,31 @@
package czech
import "testing"
func TestNormalize(t *testing.T) {
cases := []struct {
in string
want string
}{
{"Honza", "honza"},
{"žluťoučký", "zlutoucky"},
{"Příliš", "prilis"},
{"Dvořák", "dvorak"},
{"Růžena", "ruzena"},
{"Čeněk", "cenek"},
{"Kačer", "kacer"},
{"", ""},
{"prilis", "prilis"}, // idempotent
{"Jan Novák", "jan novak"}, // whitespace preserved
{"é", "e"}, // precomposed é (NFC)
{"é", "e"}, // decomposed e + combining acute
{"Ondřej Procházka", "ondrej prochazka"}, // realistic full name
}
for _, tc := range cases {
got := Normalize(tc.in)
if got != tc.want {
t.Errorf("Normalize(%q) = %q, want %q", tc.in, got, tc.want)
}
}
}