Files
maru-hleda-byt/build/CONTAINER.md

3.7 KiB

Container Setup

OCI container image for the apartment finder. Runs two processes:

  1. Web server (python3 -m http.server) serving mapa_bytu.html on port 8080
  2. Cron job running run_all.sh (all 6 scrapers + merge) every 12 hours

Architecture

┌─────────────────────────────────────────┐
│  Container (python:3.13-alpine)         │
│                                         │
│  PID 1: python3 -m http.server :8080    │
│         serves /app/data/               │
│                                         │
│  crond:  runs run_all.sh at 06:00/18:00 │
│          Europe/Prague timezone          │
│                                         │
│  /app/        ← scripts (.py, .sh)      │
│  /app/data/   ← volume (JSON + HTML)    │
│         ↑ symlinked from /app/byty_*    │
└─────────────────────────────────────────┘

On startup, the web server starts immediately. The initial scrape runs in the background and populates data as it completes. Subsequent cron runs update the data in-place.

Build and Run

# Build the image
docker build -t maru-hleda-byt .

# Run with persistent data volume
docker run -d --name maru-hleda-byt \
  -p 8080:8080 \
  -v maru-hleda-byt-data:/app/data \
  --restart unless-stopped \
  maru-hleda-byt

Access the map at http://localhost:8080/mapa_bytu.html

Volume Persistence

A named volume maru-hleda-byt-data stores:

  • byty_*.json — cached scraper data (6 source files + 1 merged)
  • mapa_bytu.html — the generated interactive map

The JSON cache is important: each scraper skips re-fetching properties that haven't changed. Without the volume, every container restart triggers a full re-scrape of all 6 portals (several minutes with rate limiting).

Cron Schedule

Scrapers run at 06:00 and 18:00 Europe/Prague time (CET/CEST).

Cron output is forwarded to the container's stdout/stderr, visible via docker logs.

Operations

# View logs (including cron and scraper output)
docker logs -f maru-hleda-byt

# Check cron schedule
docker exec maru-hleda-byt crontab -l

# Trigger a manual scrape
docker exec maru-hleda-byt bash /app/run_all.sh

# Stop / start (data persists in volume)
docker stop maru-hleda-byt
docker start maru-hleda-byt

# Rebuild after code changes
docker stop maru-hleda-byt && docker rm maru-hleda-byt
docker build -t maru-hleda-byt .
docker run -d --name maru-hleda-byt \
  -p 8080:8080 \
  -v maru-hleda-byt-data:/app/data \
  --restart unless-stopped \
  maru-hleda-byt

Troubleshooting

Map shows 404: The initial background scrape hasn't finished yet. Check docker logs for progress. First run takes a few minutes due to rate-limited API calls.

SSL errors from PSN scraper: The scrape_psn.py uses curl (not Python urllib) specifically for Cloudflare SSL compatibility. Alpine's curl includes modern TLS via OpenSSL, so this should work. If not, check that ca-certificates is installed (apk add ca-certificates).

Health check failing: The health check has a 5-minute start period to allow the initial scrape to complete. If it still fails, verify the HTTP server is running: docker exec maru-hleda-byt wget -q -O /dev/null http://localhost:8080/.

Timezone verification: docker exec maru-hleda-byt date should show Czech time.

Image Details

  • Base: python:3.13-alpine (~55 MB)
  • Added packages: curl, bash, tzdata (~10 MB)
  • No pip packages — all scrapers use Python standard library only
  • Approximate image size: ~70 MB