Add scraper status collection and presentation #3

Merged

kacerr merged 3 commits from add-scraper-statuses into main

2026-02-26 10:04:23 +01:00

Author	SHA1	Message	Date
Jan Novak	00c9144010	Fix DATA_DIR usage in stats/history paths, set env in Dockerfile, add validation docs All checks were successful Build and Push / build (push) Successful in 5s Details - scraper_stats.py: respect DATA_DIR env var when writing stats_*.json files - generate_status.py: read stats files and write history from DATA_DIR instead of HERE - build/Dockerfile: set DATA_DIR=/app/data as default env var - docs/validation.md: end-to-end Docker validation recipe Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-26 09:46:16 +01:00
Jan Novak	44c02b45b4	Increase history retention to 20, run scrapers every 4 hours All checks were successful Build and Push / build (push) Successful in 7s Details - generate_status.py: raise --keep default from 5 to 20 entries - build/crontab: change schedule from 06:00/18:00 to every 4 hours (*/4) covers 6 runs/day ≈ 3.3 days of history at default retention Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-26 08:53:27 +01:00
Jan Novak	5fb3b984b6	Add status dashboard, server, scraper stats, and DATA_DIR support All checks were successful Build and Push / build (push) Successful in 7s Details Key changes: - Replace ratings_server.py + status.html with a unified server.py that serves the map, scraper status dashboard, and ratings API in one process - Add scraper_stats.py utility: each scraper writes per-run stats (fetched, accepted, excluded, duration) to stats_<source>.json for the status page - generate_status.py: respect DATA_DIR env var so status.json lands in the configured data directory instead of always the project root - run_all.sh: replace the {"status":"running"} overwrite of status.json with a dedicated scraper_running.json lock file; trap on EXIT ensures cleanup even on kill/error, preventing the previous run's results from being wiped - server.py: detect running state via scraper_running.json existence instead of status["status"] field, eliminating the dual-use race condition - Makefile: add serve (local dev), debug (Docker debug container) targets; add SERVER_PORT variable - build/Dockerfile + entrypoint.sh: switch to server.py, set DATA_DIR, adjust volume mounts - .gitignore: add .json and .log to keep runtime data files out of VCS - mapa_bytu.html: price-per-m² colouring, status link, UX tweaks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-26 00:30:25 +01:00

Author

SHA1

Message

Date

Jan Novak

00c9144010

Fix DATA_DIR usage in stats/history paths, set env in Dockerfile, add validation docs

Build and Push / build (push) Successful in 5s

Details

- scraper_stats.py: respect DATA_DIR env var when writing stats_*.json files
- generate_status.py: read stats files and write history from DATA_DIR instead of HERE
- build/Dockerfile: set DATA_DIR=/app/data as default env var
- docs/validation.md: end-to-end Docker validation recipe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-26 09:46:16 +01:00

Jan Novak

44c02b45b4

Increase history retention to 20, run scrapers every 4 hours

Build and Push / build (push) Successful in 7s

Details

- generate_status.py: raise --keep default from 5 to 20 entries
- build/crontab: change schedule from 06:00/18:00 to every 4 hours (*/4)
  covers 6 runs/day ≈ 3.3 days of history at default retention

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-26 08:53:27 +01:00

Jan Novak

5fb3b984b6

Add status dashboard, server, scraper stats, and DATA_DIR support

Build and Push / build (push) Successful in 7s

Details

Key changes:
- Replace ratings_server.py + status.html with a unified server.py that
  serves the map, scraper status dashboard, and ratings API in one process
- Add scraper_stats.py utility: each scraper writes per-run stats (fetched,
  accepted, excluded, duration) to stats_<source>.json for the status page
- generate_status.py: respect DATA_DIR env var so status.json lands in the
  configured data directory instead of always the project root
- run_all.sh: replace the {"status":"running"} overwrite of status.json with
  a dedicated scraper_running.json lock file; trap on EXIT ensures cleanup
  even on kill/error, preventing the previous run's results from being wiped
- server.py: detect running state via scraper_running.json existence instead
  of status["status"] field, eliminating the dual-use race condition
- Makefile: add serve (local dev), debug (Docker debug container) targets;
  add SERVER_PORT variable
- build/Dockerfile + entrypoint.sh: switch to server.py, set DATA_DIR,
  adjust volume mounts
- .gitignore: add *.json and *.log to keep runtime data files out of VCS
- mapa_bytu.html: price-per-m² colouring, status link, UX tweaks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-26 00:30:25 +01:00

Add scraper status collection and presentation #3

3 Commits