Merge pull request 'feat(go/M2.2): port czech.ParseMonthReferences' (#5) from feat/m2-2-parse-month-references into main
All checks were successful
Deploy to K8s / deploy (push) Successful in 10s
All checks were successful
Deploy to K8s / deploy (push) Successful in 10s
Reviewed-on: #5
This commit was merged in pull request #5.
This commit is contained in:
@@ -0,0 +1,205 @@
|
||||
# Plan: Go rewrite — M2.2 `domain/czech.ParseMonthReferences`
|
||||
|
||||
## Context
|
||||
|
||||
M2.1 (`domain/czech.Normalize`) merged via PR #4 (`d9a61b3`) on
|
||||
2026-05-05. Per the [progress tracker](2026-05-03-2349-go-backend-rewrite-progress.md),
|
||||
**M2.2** is next: port `parse_month_references` from
|
||||
[scripts/czech_utils.py](../../scripts/czech_utils.py) to Go as
|
||||
`internal/domain/czech.ParseMonthReferences`.
|
||||
|
||||
This function is the second-most-load-bearing pure helper after
|
||||
`reconcile`: every payment-message → month inference goes through it.
|
||||
Risk #4 in the [parent plan](2026-05-03-2349-go-backend-rewrite.md)
|
||||
specifically calls out its semantics — wrap-around year inference and
|
||||
the `m >= 10 → previous year` standalone heuristic — as easy to mis-port.
|
||||
|
||||
This plan locks the test table against the live Python implementation
|
||||
*before* coding, so the Go port has a verified parity baseline even
|
||||
before the M3.1/M3.2 fixture infrastructure exists.
|
||||
|
||||
## Scope
|
||||
|
||||
- New file `go/internal/domain/czech/parse_month_references.go` in the
|
||||
existing `czech` package (alongside [normalize.go](../../go/internal/domain/czech/normalize.go)).
|
||||
- New file `go/internal/domain/czech/parse_month_references_test.go`
|
||||
with the test table below.
|
||||
- **Out of scope:** parity-fixture wiring (M3.1/M3.2); CLI hook-up
|
||||
(M2.11/M2.12); any consumer call-sites.
|
||||
- **No new dependencies** — stdlib `regexp`, `sort`, `strconv`, `strings`
|
||||
plus the existing `czech.Normalize` cover everything.
|
||||
|
||||
## Recommended approach
|
||||
|
||||
### Python contract to mirror
|
||||
|
||||
Three regex passes, all run on `normalize(text)`:
|
||||
|
||||
1. `([\d+]+)\s*/\s*(\d{2,4})` — captures `"11+12/2025"`, `"01/26"`, `"1/26"`.
|
||||
Split the months part on `+`, keep digit-only tokens, validate `1..12`.
|
||||
Year < 100 → year + 2000.
|
||||
2. `(\d{1,2})\s*\.\s*(\d{4})` — captures `"12.2025"`. **4-digit year only**
|
||||
(so `"1.26"` does not match).
|
||||
3. Czech month names. First the **range** sub-pass:
|
||||
`(name)\s*-\s*(name)` finds pairs; walk start→end with `m % 12 + 1`,
|
||||
stopping when `m == end_m`. Wrap rule: if `start_m > end_m`, months
|
||||
`>= start_m` are `defaultYear - 1`, the rest are `defaultYear`. Both
|
||||
matched names go into a `foundInRanges` set.
|
||||
Then the **standalone** sub-pass: `\b(name)\b`, skipping any name in
|
||||
`foundInRanges`. For each remaining match, `m >= 10 → defaultYear - 1`,
|
||||
else `defaultYear`.
|
||||
|
||||
Output: sorted, deduplicated `[]string` of `"YYYY-MM"`.
|
||||
|
||||
### Go signature
|
||||
|
||||
```go
|
||||
package czech
|
||||
|
||||
// ParseMonthReferences extracts YYYY-MM month references from Czech
|
||||
// free text. defaultYear seeds two heuristics: standalone month names
|
||||
// with m >= 10 are treated as defaultYear-1 (out-of-year backfill), and
|
||||
// wrap-around ranges (e.g. listopad-leden) place months >= start in
|
||||
// defaultYear-1.
|
||||
func ParseMonthReferences(text string, defaultYear int) []string
|
||||
```
|
||||
|
||||
Required `defaultYear` (no default value — Go convention).
|
||||
|
||||
### Implementation sketch
|
||||
|
||||
```go
|
||||
var czechMonths = map[string]int{
|
||||
"leden": 1, "ledna": 1, "lednu": 1,
|
||||
"unor": 2, "unora": 2, "unoru": 2,
|
||||
"brezen": 3, "brezna": 3, "breznu": 3,
|
||||
"duben": 4, "dubna": 4, "dubnu": 4,
|
||||
"kveten": 5, "kvetna": 5, "kvetnu": 5,
|
||||
"cerven": 6, "cervna": 6, "cervnu": 6,
|
||||
"cervenec": 7, "cervnce": 7, "cervenci": 7,
|
||||
"srpen": 8, "srpna": 8, "srpnu": 8,
|
||||
"zari": 9,
|
||||
"rijen": 10, "rijna": 10, "rijnu": 10,
|
||||
"listopad": 11, "listopadu": 11,
|
||||
"prosinec": 12, "prosince": 12, "prosinci": 12,
|
||||
}
|
||||
|
||||
// Sorted by descending length at init, so longer alternatives win in
|
||||
// the regex (e.g. "cervenec" beats "cerven"). Mirrors Python's
|
||||
// sorted(..., key=len, reverse=True).
|
||||
var monthNameAlt = buildMonthNameAlt()
|
||||
|
||||
var (
|
||||
numericRe = regexp.MustCompile(`([\d+]+)\s*/\s*(\d{2,4})`)
|
||||
dotRe = regexp.MustCompile(`(\d{1,2})\s*\.\s*(\d{4})`)
|
||||
rangeRe = regexp.MustCompile(`(` + monthNameAlt + `)\s*-\s*(` + monthNameAlt + `)`)
|
||||
standRe = regexp.MustCompile(`\b(` + monthNameAlt + `)\b`)
|
||||
)
|
||||
```
|
||||
|
||||
Three Go-specific gotchas worth a code comment:
|
||||
|
||||
1. **RE2 alternation is leftmost-first**, same as Python `re`. Sorting
|
||||
month names by descending length is therefore necessary (otherwise
|
||||
`"cervenec"` matches as `"cerven"` + leftover `"ec"`). Mirror the
|
||||
Python sort exactly.
|
||||
2. **Map iteration is randomized in Go.** Build the alternation list
|
||||
from a sorted slice of keys, not by iterating the map.
|
||||
3. **`\d` and `\b`** in Go RE2 are ASCII-only, which matches the
|
||||
effective behavior on `Normalize`'d input (NFKD already collapsed
|
||||
any Unicode digits/letters that would matter; standalone Devanagari
|
||||
digits in member messages aren't a real-world concern).
|
||||
|
||||
The walk loop uses a bounded counter (max 12 iterations) defensively in
|
||||
Go; Python's `while True` is fine because every range terminates within
|
||||
12 hops, but a future reader appreciates the bound.
|
||||
|
||||
### Test table (verified against live Python — `default_year=2026`)
|
||||
|
||||
Locked outputs from `PYTHONPATH=scripts:. python -c 'from czech_utils
|
||||
import parse_month_references; print(parse_month_references(<input>, 2026))'`
|
||||
on 2026-05-05.
|
||||
|
||||
| # | Input | Expected | Path exercised |
|
||||
|---|---|---|---|
|
||||
| 1 | `""` | `[]` | empty |
|
||||
| 2 | `"11+12/2025"` | `["2025-11", "2025-12"]` | numeric, plus-split |
|
||||
| 3 | `"1/2026"` | `["2026-01"]` | numeric, single |
|
||||
| 4 | `"01/26"` | `["2026-01"]` | 2-digit year normalization |
|
||||
| 5 | `"11+12/25"` | `["2025-11", "2025-12"]` | plus-split + 2-digit year |
|
||||
| 6 | `"12+1+2/2026"` | `["2026-01", "2026-02", "2026-12"]` | sorting |
|
||||
| 7 | `"12.2025"` | `["2025-12"]` | dot pattern |
|
||||
| 8 | `"1.26"` | `[]` | dot pattern requires 4-digit year |
|
||||
| 9 | `"leden"` | `["2026-01"]` | standalone, m<10 |
|
||||
| 10 | `"prosinec"` | `["2025-12"]` | standalone, m≥10 → previous year |
|
||||
| 11 | `"prosince"` | `["2025-12"]` | declension |
|
||||
| 12 | `"lednu"` | `["2026-01"]` | declension |
|
||||
| 13 | `"rijen"` | `["2025-10"]` | m≥10 boundary (10 itself) |
|
||||
| 14 | `"zari"` | `["2026-09"]` | m<10 just below boundary |
|
||||
| 15 | `"listopad-leden"` | `["2025-11", "2025-12", "2026-01"]` | wrap range Nov→Jan |
|
||||
| 16 | `"rijen-leden"` | `["2025-10", "2025-11", "2025-12", "2026-01"]` | wrap from October |
|
||||
| 17 | `"unor-kveten"` | `["2026-02", "2026-03", "2026-04", "2026-05"]` | non-wrap range |
|
||||
| 18 | `"leden-leden"` | `["2026-01"]` | degenerate range |
|
||||
| 19 | `"unor-listopad"` | `["2026-02", ..., "2026-11"]` (10 entries) | range spans m≥10 — heuristic does NOT fire (range exclusion) |
|
||||
| 20 | `"cervenec-srpen"` | `["2026-07", "2026-08"]` | longest-match alt (`cervenec` not `cerven`+`ec`) |
|
||||
| 21 | `"listopad-leden, prosinec"` | `["2025-11", "2025-12", "2026-01"]` | range + standalone, dedup |
|
||||
| 22 | `"prosinec leden"` | `["2025-12", "2026-01"]` | two standalones, no range |
|
||||
| 23 | `"11+12/2025, leden-brezen"` | `["2025-11", "2025-12", "2026-01", "2026-02", "2026-03"]` | numeric + range mix |
|
||||
| 24 | `"11+12/25 a listopad"` | `["2025-11", "2025-12"]` | dedup across passes |
|
||||
| 25 | `"prosince/2025"` | `["2025-12"]` | numeric pattern fails (no digits before `/`); standalone fires |
|
||||
| 26 | `"listopad-prosinec/2025"` | `["2026-11", "2026-12"]` | range wins; numeric pattern fails |
|
||||
| 27 | `"01.2026 / 02.2026"` | `["2026-01", "2026-02"]` | dot pattern only; numeric matches `(2026, 02)` but month 2026 is out of range |
|
||||
| 28 | `"/12/2025"` | `["2025-12"]` | numeric matches at second `/` |
|
||||
| 29 | `"PROSINEC"` | `["2025-12"]` | normalize lowercases |
|
||||
| 30 | `"Žluťoučký prosinec"` | `["2025-12"]` | normalize strips diacritics |
|
||||
| 31 | `"Únor - květen"` | `["2026-02", ..., "2026-05"]` | range tolerates spaces around `-`, diacritics survive normalize |
|
||||
| 32 | `"platba 11/2025 a leden"` | `["2025-11", "2026-01"]` | mixed natural-language |
|
||||
| 33 | `"December"` | `[]` | English month names not recognized |
|
||||
| 34 | `"11+12/2025 11+12/2025"` | `["2025-11", "2025-12"]` | dedup of repeated input |
|
||||
| 35 | `"leden 2026"` | `["2026-01"]` | trailing year is ignored unless dot/slash separator present |
|
||||
|
||||
35 cases is enough to lock semantics; the M3.x corpus will pile on
|
||||
real-message fixtures later.
|
||||
|
||||
### Wire-up
|
||||
|
||||
- No `go.mod` changes (stdlib only).
|
||||
- No CLI changes.
|
||||
- `Normalize` is in the same package, so call it directly.
|
||||
|
||||
## Critical files
|
||||
|
||||
- New: [go/internal/domain/czech/parse_month_references.go](../../go/internal/domain/czech/parse_month_references.go)
|
||||
- New: [go/internal/domain/czech/parse_month_references_test.go](../../go/internal/domain/czech/parse_month_references_test.go)
|
||||
- Reference (read-only): [scripts/czech_utils.py](../../scripts/czech_utils.py) — the porting source
|
||||
- Reference (read-only): [docs/plans/2026-05-03-2349-go-backend-rewrite.md](2026-05-03-2349-go-backend-rewrite.md) — risk #4
|
||||
- Reuses: [go/internal/domain/czech/normalize.go](../../go/internal/domain/czech/normalize.go) — `Normalize` is called once at the top of `ParseMonthReferences`
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end checks before marking M2.2 done:
|
||||
|
||||
1. `cd go && go build ./...` — clean compile.
|
||||
2. `cd go && go test ./internal/domain/czech/...` — all 35 table cases green.
|
||||
3. `cd go && go test -race ./...` — race-clean (regex compiles are global; verify no init races).
|
||||
4. `cd go && golangci-lint run` (or `make go-lint` from repo root) — clean, gofumpt-formatted.
|
||||
5. **Spot parity** (manual, will be automated in M3.x): each test input has its expected output captured from the live Python implementation on 2026-05-05; the test table itself is the parity record. If any case diverges during implementation, re-run Python with the exact input to confirm the truth and update either the Go code or the test entry.
|
||||
6. `make go-build && make go-test && make go-lint` from repo root — proves M1/M2.1 gate still passes.
|
||||
|
||||
## Branching & follow-up
|
||||
|
||||
Per [CLAUDE.md](../../CLAUDE.md), this is feature work → branch + Gitea MR via `tea`:
|
||||
|
||||
- Branch: `feat/m2-2-parse-month-references` off `main`.
|
||||
- Single focused commit, Co-Authored-By trailer.
|
||||
- Push with `-u`.
|
||||
- Open MR with `tea pr create --title "feat(go/M2.2): port czech.ParseMonthReferences" --description ... --base main --head feat/m2-2-parse-month-references`. Print the MR URL for the user.
|
||||
- User merges/deletes the branch in Gitea — never from the CLI.
|
||||
|
||||
After merge (small doc edits land straight on `main` per CLAUDE.md exception):
|
||||
|
||||
- Tick `M2.2` in the [progress tracker](2026-05-03-2349-go-backend-rewrite-progress.md) with the merge SHA.
|
||||
- Add a one-line `CHANGELOG.md` entry (timestamp via `date "+%Y-%m-%d %H:%M %Z"`).
|
||||
- Record any porting surprise (e.g. an unexpected diff between Go RE2 and Python `re`) in the tracker's "Notes & decisions" section.
|
||||
|
||||
Next task is **M2.3 `domain/fees.CalculateFee`** — straightforward constants table; no parser semantics to debate.
|
||||
154
go/internal/domain/czech/parse_month_references.go
Normal file
154
go/internal/domain/czech/parse_month_references.go
Normal file
@@ -0,0 +1,154 @@
|
||||
package czech
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"regexp"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
var czechMonths = map[string]int{
|
||||
"leden": 1, "ledna": 1, "lednu": 1,
|
||||
"unor": 2, "unora": 2, "unoru": 2,
|
||||
"brezen": 3, "brezna": 3, "breznu": 3,
|
||||
"duben": 4, "dubna": 4, "dubnu": 4,
|
||||
"kveten": 5, "kvetna": 5, "kvetnu": 5,
|
||||
"cerven": 6, "cervna": 6, "cervnu": 6,
|
||||
"cervenec": 7, "cervnce": 7, "cervenci": 7,
|
||||
"srpen": 8, "srpna": 8, "srpnu": 8,
|
||||
"zari": 9,
|
||||
"rijen": 10, "rijna": 10, "rijnu": 10,
|
||||
"listopad": 11, "listopadu": 11,
|
||||
"prosinec": 12, "prosince": 12, "prosinci": 12,
|
||||
}
|
||||
|
||||
var (
|
||||
numericRe *regexp.Regexp
|
||||
dotRe *regexp.Regexp
|
||||
rangeRe *regexp.Regexp
|
||||
standRe *regexp.Regexp
|
||||
)
|
||||
|
||||
func init() {
|
||||
// Sort by descending length so longer alternatives win in RE2 leftmost-first
|
||||
// matching (e.g. "cervenec" is tried before "cerven").
|
||||
names := make([]string, 0, len(czechMonths))
|
||||
for name := range czechMonths {
|
||||
names = append(names, name)
|
||||
}
|
||||
sort.Slice(names, func(i, j int) bool {
|
||||
if len(names[i]) != len(names[j]) {
|
||||
return len(names[i]) > len(names[j])
|
||||
}
|
||||
return names[i] < names[j]
|
||||
})
|
||||
alt := strings.Join(names, "|")
|
||||
|
||||
numericRe = regexp.MustCompile(`([\d+]+)\s*/\s*(\d{2,4})`)
|
||||
dotRe = regexp.MustCompile(`(\d{1,2})\s*\.\s*(\d{4})`)
|
||||
rangeRe = regexp.MustCompile(`(` + alt + `)\s*-\s*(` + alt + `)`)
|
||||
standRe = regexp.MustCompile(`\b(` + alt + `)\b`)
|
||||
}
|
||||
|
||||
// ParseMonthReferences extracts YYYY-MM month references from Czech free text.
|
||||
//
|
||||
// defaultYear seeds two heuristics: standalone month names with m >= 10 are
|
||||
// treated as defaultYear-1 (out-of-year backfill), and wrap-around ranges
|
||||
// (e.g. listopad-leden) place months >= start_m in defaultYear-1.
|
||||
//
|
||||
// Returns a sorted, deduplicated slice of "YYYY-MM" strings.
|
||||
func ParseMonthReferences(text string, defaultYear int) []string {
|
||||
normalized := Normalize(text)
|
||||
seen := map[string]struct{}{}
|
||||
|
||||
add := func(year, m int) {
|
||||
if m >= 1 && m <= 12 {
|
||||
seen[fmt.Sprintf("%04d-%02d", year, m)] = struct{}{}
|
||||
}
|
||||
}
|
||||
|
||||
// Pass 1: numeric months — "11+12/2025", "01/26", "1/2026"
|
||||
for _, groups := range numericRe.FindAllStringSubmatch(normalized, -1) {
|
||||
monthsPart, yearStr := groups[1], groups[2]
|
||||
year, err := strconv.Atoi(yearStr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if year < 100 {
|
||||
year += 2000
|
||||
}
|
||||
for mStr := range strings.SplitSeq(monthsPart, "+") {
|
||||
mStr = strings.TrimSpace(mStr)
|
||||
if mStr == "" {
|
||||
continue
|
||||
}
|
||||
allDigits := true
|
||||
for _, c := range mStr {
|
||||
if c < '0' || c > '9' {
|
||||
allDigits = false
|
||||
break
|
||||
}
|
||||
}
|
||||
if !allDigits {
|
||||
continue
|
||||
}
|
||||
m, err := strconv.Atoi(mStr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
add(year, m)
|
||||
}
|
||||
}
|
||||
|
||||
// Pass 2: dot-separated month.year — "12.2025" (4-digit year only)
|
||||
for _, groups := range dotRe.FindAllStringSubmatch(normalized, -1) {
|
||||
m, _ := strconv.Atoi(groups[1])
|
||||
year, _ := strconv.Atoi(groups[2])
|
||||
add(year, m)
|
||||
}
|
||||
|
||||
// Pass 3a: Czech month name ranges — "listopad-leden"
|
||||
foundInRanges := map[string]struct{}{}
|
||||
for _, groups := range rangeRe.FindAllStringSubmatch(normalized, -1) {
|
||||
startName, endName := groups[1], groups[2]
|
||||
foundInRanges[startName] = struct{}{}
|
||||
foundInRanges[endName] = struct{}{}
|
||||
startM := czechMonths[startName]
|
||||
endM := czechMonths[endName]
|
||||
wraps := startM > endM
|
||||
m := startM
|
||||
for range 12 {
|
||||
year := defaultYear
|
||||
if wraps && m >= startM {
|
||||
year = defaultYear - 1
|
||||
}
|
||||
add(year, m)
|
||||
if m == endM {
|
||||
break
|
||||
}
|
||||
m = m%12 + 1
|
||||
}
|
||||
}
|
||||
|
||||
// Pass 3b: standalone Czech month names (not part of a range)
|
||||
for _, groups := range standRe.FindAllStringSubmatch(normalized, -1) {
|
||||
name := groups[1]
|
||||
if _, inRange := foundInRanges[name]; inRange {
|
||||
continue
|
||||
}
|
||||
m := czechMonths[name]
|
||||
year := defaultYear
|
||||
if m >= 10 {
|
||||
year = defaultYear - 1
|
||||
}
|
||||
add(year, m)
|
||||
}
|
||||
|
||||
result := make([]string, 0, len(seen))
|
||||
for k := range seen {
|
||||
result = append(result, k)
|
||||
}
|
||||
sort.Strings(result)
|
||||
return result
|
||||
}
|
||||
244
go/internal/domain/czech/parse_month_references_test.go
Normal file
244
go/internal/domain/czech/parse_month_references_test.go
Normal file
@@ -0,0 +1,244 @@
|
||||
package czech
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestParseMonthReferences(t *testing.T) {
|
||||
t.Parallel()
|
||||
|
||||
// All expected outputs verified against live Python implementation on 2026-05-05:
|
||||
// PYTHONPATH=scripts:. python -c 'from czech_utils import parse_month_references; print(parse_month_references("<input>", 2026))'
|
||||
tests := []struct {
|
||||
name string
|
||||
input string
|
||||
defaultYear int
|
||||
want []string
|
||||
}{
|
||||
{
|
||||
name: "empty",
|
||||
input: "",
|
||||
defaultYear: 2026,
|
||||
want: []string{},
|
||||
},
|
||||
{
|
||||
name: "numeric plus-split two months full year",
|
||||
input: "11+12/2025",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12"},
|
||||
},
|
||||
{
|
||||
name: "numeric single month full year",
|
||||
input: "1/2026",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01"},
|
||||
},
|
||||
{
|
||||
name: "numeric 2-digit year",
|
||||
input: "01/26",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01"},
|
||||
},
|
||||
{
|
||||
name: "numeric plus-split with 2-digit year",
|
||||
input: "11+12/25",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12"},
|
||||
},
|
||||
{
|
||||
name: "numeric three months sorted",
|
||||
input: "12+1+2/2026",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01", "2026-02", "2026-12"},
|
||||
},
|
||||
{
|
||||
name: "dot pattern",
|
||||
input: "12.2025",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "dot pattern requires 4-digit year",
|
||||
input: "1.26",
|
||||
defaultYear: 2026,
|
||||
want: []string{},
|
||||
},
|
||||
{
|
||||
name: "standalone month below m10 threshold",
|
||||
input: "leden",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01"},
|
||||
},
|
||||
{
|
||||
name: "standalone month m10 heuristic",
|
||||
input: "prosinec",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "declension prosince",
|
||||
input: "prosince",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "declension lednu",
|
||||
input: "lednu",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01"},
|
||||
},
|
||||
{
|
||||
name: "standalone m10 boundary (rijen = October)",
|
||||
input: "rijen",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-10"},
|
||||
},
|
||||
{
|
||||
name: "standalone m9 just below boundary (zari = September)",
|
||||
input: "zari",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-09"},
|
||||
},
|
||||
{
|
||||
name: "range wrap Nov-Jan",
|
||||
input: "listopad-leden",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12", "2026-01"},
|
||||
},
|
||||
{
|
||||
name: "range wrap starting at October",
|
||||
input: "rijen-leden",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-10", "2025-11", "2025-12", "2026-01"},
|
||||
},
|
||||
{
|
||||
name: "range no wrap",
|
||||
input: "unor-kveten",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-02", "2026-03", "2026-04", "2026-05"},
|
||||
},
|
||||
{
|
||||
name: "degenerate range same month",
|
||||
input: "leden-leden",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01"},
|
||||
},
|
||||
{
|
||||
name: "range spanning m10 — heuristic does NOT fire for range members",
|
||||
input: "unor-listopad",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-02", "2026-03", "2026-04", "2026-05", "2026-06", "2026-07", "2026-08", "2026-09", "2026-10", "2026-11"},
|
||||
},
|
||||
{
|
||||
name: "longest-match alternation cervenec beats cerven",
|
||||
input: "cervenec-srpen",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-07", "2026-08"},
|
||||
},
|
||||
{
|
||||
name: "range plus standalone — range excludes, dedup",
|
||||
input: "listopad-leden, prosinec",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12", "2026-01"},
|
||||
},
|
||||
{
|
||||
name: "two standalones no range",
|
||||
input: "prosinec leden",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12", "2026-01"},
|
||||
},
|
||||
{
|
||||
name: "numeric plus range mix",
|
||||
input: "11+12/2025, leden-brezen",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12", "2026-01", "2026-02", "2026-03"},
|
||||
},
|
||||
{
|
||||
name: "dedup across numeric and standalone passes",
|
||||
input: "11+12/25 a listopad",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12"},
|
||||
},
|
||||
{
|
||||
name: "no digits before slash — standalone fires instead",
|
||||
input: "prosince/2025",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "range with trailing slash-year — numeric fails, range wins",
|
||||
input: "listopad-prosinec/2025",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-11", "2026-12"},
|
||||
},
|
||||
{
|
||||
name: "dot pattern only — numeric matches but month out of 1-12 range",
|
||||
input: "01.2026 / 02.2026",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01", "2026-02"},
|
||||
},
|
||||
{
|
||||
name: "leading slash — numeric matches at second slash",
|
||||
input: "/12/2025",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "uppercase input normalized",
|
||||
input: "PROSINEC",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "diacritics stripped by Normalize",
|
||||
input: "Žluťoučký prosinec",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-12"},
|
||||
},
|
||||
{
|
||||
name: "diacritics in range with spaces around dash",
|
||||
input: "Únor - květen",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-02", "2026-03", "2026-04", "2026-05"},
|
||||
},
|
||||
{
|
||||
name: "natural language mixed with numeric and standalone",
|
||||
input: "platba 11/2025 a leden",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2026-01"},
|
||||
},
|
||||
{
|
||||
name: "English month name not recognized",
|
||||
input: "December",
|
||||
defaultYear: 2026,
|
||||
want: []string{},
|
||||
},
|
||||
{
|
||||
name: "duplicate input deduped",
|
||||
input: "11+12/2025 11+12/2025",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2025-11", "2025-12"},
|
||||
},
|
||||
{
|
||||
name: "trailing year without separator ignored",
|
||||
input: "leden 2026",
|
||||
defaultYear: 2026,
|
||||
want: []string{"2026-01"},
|
||||
},
|
||||
}
|
||||
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
got := ParseMonthReferences(tc.input, tc.defaultYear)
|
||||
if got == nil {
|
||||
got = []string{}
|
||||
}
|
||||
if !reflect.DeepEqual(got, tc.want) {
|
||||
t.Errorf("ParseMonthReferences(%q, %d)\n got %v\n want %v",
|
||||
tc.input, tc.defaultYear, got, tc.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user