Reliability improvements: retry logic, validation, ratings sync
Some checks failed
Build and Push / build (push) Failing after 4s
Some checks failed
Build and Push / build (push) Failing after 4s
- Add 3-attempt retry with exponential backoff to Sreality, Realingo,
Bezrealitky, and PSN scrapers (CityHome and iDNES already had it)
- Add shared validate_listing() in scraper_stats.py; all 6 scrapers now
validate GPS bounds, price, area, and required fields before output
- Wire ratings to server /api/ratings on page load (merge with
localStorage) and save (async POST); ratings now persist across
browsers and devices
- Namespace JS hash IDs as {source}_{id} to prevent rating collisions
between listings from different portals with the same numeric ID
- Replace manual Czech diacritic table with unicodedata.normalize()
in merge_and_map.py for correct deduplication of all edge cases
- Correct README schedule docs: every 4 hours, not twice daily
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -9,6 +9,7 @@ from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import unicodedata
|
||||
from pathlib import Path
|
||||
|
||||
from scrape_and_map import generate_map, format_price
|
||||
@@ -19,14 +20,8 @@ def normalize_street(locality: str) -> str:
|
||||
# "Studentská, Praha 6 - Dejvice" → "studentska"
|
||||
# "Rýnská, Praha" → "rynska"
|
||||
street = locality.split(",")[0].strip().lower()
|
||||
# Remove diacritics (simple Czech)
|
||||
replacements = {
|
||||
"á": "a", "č": "c", "ď": "d", "é": "e", "ě": "e",
|
||||
"í": "i", "ň": "n", "ó": "o", "ř": "r", "š": "s",
|
||||
"ť": "t", "ú": "u", "ů": "u", "ý": "y", "ž": "z",
|
||||
}
|
||||
for src, dst in replacements.items():
|
||||
street = street.replace(src, dst)
|
||||
# Remove diacritics using Unicode decomposition (handles all Czech characters)
|
||||
street = unicodedata.normalize("NFKD", street).encode("ascii", "ignore").decode("ascii")
|
||||
# Remove non-alphanumeric
|
||||
street = re.sub(r"[^a-z0-9]", "", street)
|
||||
return street
|
||||
|
||||
Reference in New Issue
Block a user