Reliability improvements: retry logic, validation, ratings sync
Some checks failed
Build and Push / build (push) Failing after 4s
Some checks failed
Build and Push / build (push) Failing after 4s
- Add 3-attempt retry with exponential backoff to Sreality, Realingo,
Bezrealitky, and PSN scrapers (CityHome and iDNES already had it)
- Add shared validate_listing() in scraper_stats.py; all 6 scrapers now
validate GPS bounds, price, area, and required fields before output
- Wire ratings to server /api/ratings on page load (merge with
localStorage) and save (async POST); ratings now persist across
browsers and devices
- Namespace JS hash IDs as {source}_{id} to prevent rating collisions
between listings from different portals with the same numeric ID
- Replace manual Czech diacritic table with unicodedata.normalize()
in merge_and_map.py for correct deduplication of all edge cases
- Correct README schedule docs: every 4 hours, not twice daily
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -151,7 +151,7 @@ The project includes a Docker setup for unattended operation with a cron-based s
|
||||
│ PID 1: python3 -m http.server :8080 │
|
||||
│ serves /app/data/ │
|
||||
│ │
|
||||
│ crond: runs run_all.sh at 06:00/18:00 │
|
||||
│ crond: runs run_all.sh every 4 hours │
|
||||
│ Europe/Prague timezone │
|
||||
│ │
|
||||
│ /app/ -- scripts (.py, .sh) │
|
||||
@@ -160,7 +160,7 @@ The project includes a Docker setup for unattended operation with a cron-based s
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
On startup, the HTTP server starts immediately. The initial scrape runs in the background. Subsequent cron runs update data in-place twice daily at 06:00 and 18:00 CET/CEST.
|
||||
On startup, the HTTP server starts immediately. The initial scrape runs in the background. Subsequent cron runs update data in-place every 4 hours.
|
||||
|
||||
### Quick start
|
||||
|
||||
@@ -208,7 +208,7 @@ Validation targets run scrapers with `--max-pages 1 --max-properties 10` for a f
|
||||
├── build/
|
||||
│ ├── Dockerfile # Container image definition (python:3.13-alpine)
|
||||
│ ├── entrypoint.sh # Container entrypoint (HTTP server + cron + initial scrape)
|
||||
│ ├── crontab # Cron schedule (06:00 and 18:00 CET)
|
||||
│ ├── crontab # Cron schedule (every 4 hours)
|
||||
│ └── CONTAINER.md # Container-specific documentation
|
||||
└── .gitignore # Ignores byty_*.json, __pycache__, .vscode
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user