feat(go/M2.1): port czech.Normalize — NFKD + Mn strip + lowercase
All checks were successful
Deploy to K8s / deploy (push) Successful in 8s

Adds internal/domain/czech.Normalize, the first pure-domain function in
the Go rewrite (M2 milestone). Matches Python czech_utils.normalize byte-
for-byte: NFKD decompose via golang.org/x/text/unicode/norm, drop Mn-
category combining marks (unicode.Mn, not IsMark, to match Python's
unicodedata.combining() semantics), then strings.ToLower.

Includes 13-case table-driven test; all inputs spot-checked against the
Python implementation before locking. Adds golang.org/x/text v0.36.0 as
first external dependency.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-05 22:23:40 +02:00
parent 91ac3b37cf
commit d9a61b338c
5 changed files with 215 additions and 0 deletions

View File

@@ -0,0 +1,26 @@
package czech
import (
"strings"
"unicode"
"golang.org/x/text/unicode/norm"
)
// Normalize strips diacritics and lowercases s.
//
// Matches Python: unicodedata.normalize("NFKD", s) then filter out
// combining characters (unicode.Mn only — not Mc/Me, which have
// combining class 0 in Python's unicodedata.combining()).
func Normalize(s string) string {
decomposed := norm.NFKD.String(s)
var b strings.Builder
b.Grow(len(decomposed))
for _, r := range decomposed {
if unicode.In(r, unicode.Mn) {
continue
}
b.WriteRune(r)
}
return strings.ToLower(b.String())
}