4 Commits

Author SHA1 Message Date
63663e4b6b Merge pull request 'Move Realingo scraper to run last' (#6) from fix/scraper-order into main
All checks were successful
Build and Push / build (push) Successful in 6s
Reviewed-on: #6
2026-02-27 21:19:29 +00:00
8c052840cd Move Realingo scraper to run last in pipeline
Reorder scrapers: Sreality → Bezrealitky → iDNES → PSN+CityHome → Realingo → Merge

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 21:35:54 +01:00
39e4b9ce2a Merge pull request 'Reliability improvements and cleanup' (#5) from improve/reliability-and-fixes into main
Reviewed-on: #5
2026-02-27 10:26:04 +00:00
Jan Novak
fd3991f8d6 Remove regen_map.py references from Dockerfile and README
All checks were successful
Build and Push / build (push) Successful in 6s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:44:08 +01:00
3 changed files with 4 additions and 9 deletions

View File

@@ -83,10 +83,6 @@ Merges all `byty_*.json` files into `byty_merged.json` and generates `mapa_bytu.
**Deduplication logic:** Two listings are considered duplicates if they share the same normalized street name + price + area. PSN and CityHome have priority during dedup (loaded first), so their listings are kept over duplicates from other portals. **Deduplication logic:** Two listings are considered duplicates if they share the same normalized street name + price + area. PSN and CityHome have priority during dedup (loaded first), so their listings are kept over duplicates from other portals.
### `regen_map.py`
Regenerates the map from existing `byty_sreality.json` data without re-scraping. Fetches missing area values from the Sreality API, fixes URLs, and re-applies the area filter. Useful for tweaking map output after data has already been collected.
## Interactive map (`mapa_bytu.html`) ## Interactive map (`mapa_bytu.html`)
The generated map is a standalone HTML file using Leaflet.js with CARTO basemap tiles. Features: The generated map is a standalone HTML file using Leaflet.js with CARTO basemap tiles. Features:
@@ -201,7 +197,6 @@ Validation targets run scrapers with `--max-pages 1 --max-properties 10` for a f
├── scrape_psn.py # PSN scraper ├── scrape_psn.py # PSN scraper
├── scrape_cityhome.py # CityHome scraper ├── scrape_cityhome.py # CityHome scraper
├── merge_and_map.py # Merge all sources + generate final map ├── merge_and_map.py # Merge all sources + generate final map
├── regen_map.py # Regenerate map from cached Sreality data
├── run_all.sh # Orchestrator script (runs all scrapers + merge) ├── run_all.sh # Orchestrator script (runs all scrapers + merge)
├── mapa_bytu.html # Generated interactive map (output) ├── mapa_bytu.html # Generated interactive map (output)
├── Makefile # Docker management + validation shortcuts ├── Makefile # Docker management + validation shortcuts

View File

@@ -11,7 +11,7 @@ WORKDIR /app
COPY scrape_and_map.py scrape_realingo.py scrape_bezrealitky.py \ COPY scrape_and_map.py scrape_realingo.py scrape_bezrealitky.py \
scrape_idnes.py scrape_psn.py scrape_cityhome.py \ scrape_idnes.py scrape_psn.py scrape_cityhome.py \
merge_and_map.py regen_map.py generate_status.py scraper_stats.py \ merge_and_map.py generate_status.py scraper_stats.py \
run_all.sh server.py ./ run_all.sh server.py ./
COPY build/crontab /etc/crontabs/root COPY build/crontab /etc/crontabs/root

View File

@@ -84,9 +84,6 @@ exec > >(tee -a "$LOG_FILE") 2>&1
step "Sreality" step "Sreality"
python3 scrape_and_map.py $SCRAPER_ARGS || { echo -e "${RED}✗ Sreality selhalo${NC}"; FAILED=$((FAILED + 1)); } python3 scrape_and_map.py $SCRAPER_ARGS || { echo -e "${RED}✗ Sreality selhalo${NC}"; FAILED=$((FAILED + 1)); }
step "Realingo"
python3 scrape_realingo.py $SCRAPER_ARGS || { echo -e "${RED}✗ Realingo selhalo${NC}"; FAILED=$((FAILED + 1)); }
step "Bezrealitky" step "Bezrealitky"
python3 scrape_bezrealitky.py $SCRAPER_ARGS || { echo -e "${RED}✗ Bezrealitky selhalo${NC}"; FAILED=$((FAILED + 1)); } python3 scrape_bezrealitky.py $SCRAPER_ARGS || { echo -e "${RED}✗ Bezrealitky selhalo${NC}"; FAILED=$((FAILED + 1)); }
@@ -101,6 +98,9 @@ PID_CH=$!
wait $PID_PSN || { echo -e "${RED}✗ PSN selhalo${NC}"; FAILED=$((FAILED + 1)); } wait $PID_PSN || { echo -e "${RED}✗ PSN selhalo${NC}"; FAILED=$((FAILED + 1)); }
wait $PID_CH || { echo -e "${RED}✗ CityHome selhalo${NC}"; FAILED=$((FAILED + 1)); } wait $PID_CH || { echo -e "${RED}✗ CityHome selhalo${NC}"; FAILED=$((FAILED + 1)); }
step "Realingo"
python3 scrape_realingo.py $SCRAPER_ARGS || { echo -e "${RED}✗ Realingo selhalo${NC}"; FAILED=$((FAILED + 1)); }
# ── Sloučení + mapa ────────────────────────────────────────── # ── Sloučení + mapa ──────────────────────────────────────────
step "Sloučení dat a generování mapy" step "Sloučení dat a generování mapy"