Fix DATA_DIR usage in stats/history paths, set env in Dockerfile, add validation docs
All checks were successful
Build and Push / build (push) Successful in 5s

- scraper_stats.py: respect DATA_DIR env var when writing stats_*.json files
- generate_status.py: read stats files and write history from DATA_DIR instead of HERE
- build/Dockerfile: set DATA_DIR=/app/data as default env var
- docs/validation.md: end-to-end Docker validation recipe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Jan Novak
2026-02-26 09:46:16 +01:00
parent 44c02b45b4
commit 00c9144010
4 changed files with 130 additions and 4 deletions

View File

@@ -58,7 +58,7 @@ def read_scraper_stats(path: Path) -> dict:
def append_to_history(status: dict, keep: int) -> None:
"""Append the current status entry to scraper_history.json, keeping only `keep` latest."""
history_path = HERE / HISTORY_FILE
history_path = DATA_DIR / HISTORY_FILE
history: list = []
if history_path.exists():
try:
@@ -98,7 +98,7 @@ def main():
info["name"] = name
# Merge in stats from the per-scraper stats file (authoritative for run data)
stats = read_scraper_stats(HERE / STATS_FILES[name])
stats = read_scraper_stats(DATA_DIR / STATS_FILES[name])
for key in ("accepted", "fetched", "pages", "cache_hits", "excluded", "excluded_total",
"success", "duration_sec", "error"):
if key in stats: