Fix DATA_DIR usage in stats/history paths, set env in Dockerfile, add validation docs
All checks were successful
Build and Push / build (push) Successful in 5s
All checks were successful
Build and Push / build (push) Successful in 5s
- scraper_stats.py: respect DATA_DIR env var when writing stats_*.json files - generate_status.py: read stats files and write history from DATA_DIR instead of HERE - build/Dockerfile: set DATA_DIR=/app/data as default env var - docs/validation.md: end-to-end Docker validation recipe Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -58,7 +58,7 @@ def read_scraper_stats(path: Path) -> dict:
|
||||
|
||||
def append_to_history(status: dict, keep: int) -> None:
|
||||
"""Append the current status entry to scraper_history.json, keeping only `keep` latest."""
|
||||
history_path = HERE / HISTORY_FILE
|
||||
history_path = DATA_DIR / HISTORY_FILE
|
||||
history: list = []
|
||||
if history_path.exists():
|
||||
try:
|
||||
@@ -98,7 +98,7 @@ def main():
|
||||
info["name"] = name
|
||||
|
||||
# Merge in stats from the per-scraper stats file (authoritative for run data)
|
||||
stats = read_scraper_stats(HERE / STATS_FILES[name])
|
||||
stats = read_scraper_stats(DATA_DIR / STATS_FILES[name])
|
||||
for key in ("accepted", "fetched", "pages", "cache_hits", "excluded", "excluded_total",
|
||||
"success", "duration_sec", "error"):
|
||||
if key in stats:
|
||||
|
||||
Reference in New Issue
Block a user