Compare commits
4 Commits
0.09
...
63663e4b6b
| Author | SHA1 | Date | |
|---|---|---|---|
| 63663e4b6b | |||
| 8c052840cd | |||
| 39e4b9ce2a | |||
|
|
fd3991f8d6 |
@@ -83,10 +83,6 @@ Merges all `byty_*.json` files into `byty_merged.json` and generates `mapa_bytu.
|
||||
|
||||
**Deduplication logic:** Two listings are considered duplicates if they share the same normalized street name + price + area. PSN and CityHome have priority during dedup (loaded first), so their listings are kept over duplicates from other portals.
|
||||
|
||||
### `regen_map.py`
|
||||
|
||||
Regenerates the map from existing `byty_sreality.json` data without re-scraping. Fetches missing area values from the Sreality API, fixes URLs, and re-applies the area filter. Useful for tweaking map output after data has already been collected.
|
||||
|
||||
## Interactive map (`mapa_bytu.html`)
|
||||
|
||||
The generated map is a standalone HTML file using Leaflet.js with CARTO basemap tiles. Features:
|
||||
@@ -201,7 +197,6 @@ Validation targets run scrapers with `--max-pages 1 --max-properties 10` for a f
|
||||
├── scrape_psn.py # PSN scraper
|
||||
├── scrape_cityhome.py # CityHome scraper
|
||||
├── merge_and_map.py # Merge all sources + generate final map
|
||||
├── regen_map.py # Regenerate map from cached Sreality data
|
||||
├── run_all.sh # Orchestrator script (runs all scrapers + merge)
|
||||
├── mapa_bytu.html # Generated interactive map (output)
|
||||
├── Makefile # Docker management + validation shortcuts
|
||||
|
||||
@@ -11,7 +11,7 @@ WORKDIR /app
|
||||
|
||||
COPY scrape_and_map.py scrape_realingo.py scrape_bezrealitky.py \
|
||||
scrape_idnes.py scrape_psn.py scrape_cityhome.py \
|
||||
merge_and_map.py regen_map.py generate_status.py scraper_stats.py \
|
||||
merge_and_map.py generate_status.py scraper_stats.py \
|
||||
run_all.sh server.py ./
|
||||
|
||||
COPY build/crontab /etc/crontabs/root
|
||||
|
||||
@@ -84,9 +84,6 @@ exec > >(tee -a "$LOG_FILE") 2>&1
|
||||
step "Sreality"
|
||||
python3 scrape_and_map.py $SCRAPER_ARGS || { echo -e "${RED}✗ Sreality selhalo${NC}"; FAILED=$((FAILED + 1)); }
|
||||
|
||||
step "Realingo"
|
||||
python3 scrape_realingo.py $SCRAPER_ARGS || { echo -e "${RED}✗ Realingo selhalo${NC}"; FAILED=$((FAILED + 1)); }
|
||||
|
||||
step "Bezrealitky"
|
||||
python3 scrape_bezrealitky.py $SCRAPER_ARGS || { echo -e "${RED}✗ Bezrealitky selhalo${NC}"; FAILED=$((FAILED + 1)); }
|
||||
|
||||
@@ -101,6 +98,9 @@ PID_CH=$!
|
||||
wait $PID_PSN || { echo -e "${RED}✗ PSN selhalo${NC}"; FAILED=$((FAILED + 1)); }
|
||||
wait $PID_CH || { echo -e "${RED}✗ CityHome selhalo${NC}"; FAILED=$((FAILED + 1)); }
|
||||
|
||||
step "Realingo"
|
||||
python3 scrape_realingo.py $SCRAPER_ARGS || { echo -e "${RED}✗ Realingo selhalo${NC}"; FAILED=$((FAILED + 1)); }
|
||||
|
||||
# ── Sloučení + mapa ──────────────────────────────────────────
|
||||
|
||||
step "Sloučení dat a generování mapy"
|
||||
|
||||
Reference in New Issue
Block a user