Each property record now carries two date fields:
- first_seen: date the listing first appeared (preserved across runs)
- last_updated: date of the most recent scrape that included it
All 6 scrapers (Sreality, Realingo, Bezrealitky, iDNES, PSN, CityHome)
set these fields during scraping. Cached results preserve first_seen and
refresh last_updated. PSN and CityHome gain a load_previous() helper to
track first_seen across runs (they lacked caching before).
The merge script keeps the earliest first_seen and latest last_updated
when deduplicating listings across sources.
The HTML map now shows dates in popups ("Přidáno: DD.MM.YYYY"), displays
a green "NOVÉ" badge on newly discovered listings, and adds a "Přidáno"
dropdown filter (24h / 3 days / 7 days / 14 days) for spotting new ones.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Triggers on tag push or manual dispatch. Builds the image using
build/Dockerfile and pushes to the Gitea container registry.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover the full pipeline (scrapers, merge, map generation), all 6 data
sources with their parsing methods, filter criteria, CLI arguments,
Docker setup, caching, rate limiting, and project structure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace print() with Python logging module across all 6 scrapers
for configurable log levels (DEBUG/INFO/WARNING/ERROR)
- Add --max-pages, --max-properties, and --log-level CLI arguments
to each scraper via argparse for limiting scrape scope
- Add validation Make targets (validation, validation-local,
validation-local-debug) for quick test runs with limited data
- Update run_all.sh to parse and forward CLI args to all scrapers
- Update mapa_bytu.html with latest scrape results
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>