101 lines
3.7 KiB
Markdown
101 lines
3.7 KiB
Markdown
# Container Setup
|
|
|
|
OCI container image for the apartment finder. Runs two processes:
|
|
|
|
1. **Web server** (`python3 -m http.server`) serving `mapa_bytu.html` on port 8080
|
|
2. **Cron job** running `run_all.sh` (all 6 scrapers + merge) every 12 hours
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ Container (python:3.13-alpine) │
|
|
│ │
|
|
│ PID 1: python3 -m http.server :8080 │
|
|
│ serves /app/data/ │
|
|
│ │
|
|
│ crond: runs run_all.sh at 06:00/18:00 │
|
|
│ Europe/Prague timezone │
|
|
│ │
|
|
│ /app/ ← scripts (.py, .sh) │
|
|
│ /app/data/ ← volume (JSON + HTML) │
|
|
│ ↑ symlinked from /app/byty_* │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
On startup, the web server starts immediately. The initial scrape runs in the background and populates data as it completes. Subsequent cron runs update the data in-place.
|
|
|
|
## Build and Run
|
|
|
|
```bash
|
|
# Build the image
|
|
docker build -t maru-hleda-byt .
|
|
|
|
# Run with persistent data volume
|
|
docker run -d --name maru-hleda-byt \
|
|
-p 8080:8080 \
|
|
-v maru-hleda-byt-data:/app/data \
|
|
--restart unless-stopped \
|
|
maru-hleda-byt
|
|
```
|
|
|
|
Access the map at **http://localhost:8080/mapa_bytu.html**
|
|
|
|
## Volume Persistence
|
|
|
|
A named volume `maru-hleda-byt-data` stores:
|
|
|
|
- `byty_*.json` — cached scraper data (6 source files + 1 merged)
|
|
- `mapa_bytu.html` — the generated interactive map
|
|
|
|
The JSON cache is important: each scraper skips re-fetching properties that haven't changed. Without the volume, every container restart triggers a full re-scrape of all 6 portals (several minutes with rate limiting).
|
|
|
|
## Cron Schedule
|
|
|
|
Scrapers run at **06:00** and **18:00 Europe/Prague time** (CET/CEST).
|
|
|
|
Cron output is forwarded to the container's stdout/stderr, visible via `docker logs`.
|
|
|
|
## Operations
|
|
|
|
```bash
|
|
# View logs (including cron and scraper output)
|
|
docker logs -f maru-hleda-byt
|
|
|
|
# Check cron schedule
|
|
docker exec maru-hleda-byt crontab -l
|
|
|
|
# Trigger a manual scrape
|
|
docker exec maru-hleda-byt bash /app/run_all.sh
|
|
|
|
# Stop / start (data persists in volume)
|
|
docker stop maru-hleda-byt
|
|
docker start maru-hleda-byt
|
|
|
|
# Rebuild after code changes
|
|
docker stop maru-hleda-byt && docker rm maru-hleda-byt
|
|
docker build -t maru-hleda-byt .
|
|
docker run -d --name maru-hleda-byt \
|
|
-p 8080:8080 \
|
|
-v maru-hleda-byt-data:/app/data \
|
|
--restart unless-stopped \
|
|
maru-hleda-byt
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
**Map shows 404**: The initial background scrape hasn't finished yet. Check `docker logs` for progress. First run takes a few minutes due to rate-limited API calls.
|
|
|
|
**SSL errors from PSN scraper**: The `scrape_psn.py` uses `curl` (not Python urllib) specifically for Cloudflare SSL compatibility. Alpine's curl includes modern TLS via OpenSSL, so this should work. If not, check that `ca-certificates` is installed (`apk add ca-certificates`).
|
|
|
|
**Health check failing**: The health check has a 5-minute start period to allow the initial scrape to complete. If it still fails, verify the HTTP server is running: `docker exec maru-hleda-byt wget -q -O /dev/null http://localhost:8080/`.
|
|
|
|
**Timezone verification**: `docker exec maru-hleda-byt date` should show Czech time.
|
|
|
|
## Image Details
|
|
|
|
- **Base**: `python:3.13-alpine` (~55 MB)
|
|
- **Added packages**: `curl`, `bash`, `tzdata` (~10 MB)
|
|
- **No pip packages** — all scrapers use Python standard library only
|
|
- **Approximate image size**: ~70 MB
|