24 Commits

Author SHA1 Message Date
212a561e65 Merge pull request 'Add Bazoš.cz scraper + project docs' (#7) from feature/bazos-scraper into main
All checks were successful
Build and Push / build (push) Successful in 11s
Reviewed-on: #7
0.12
2026-03-09 10:28:32 +00:00
59ef3274b6 Add CLAUDE.md project documentation for session context
Provides automatic context loading for new Claude Code sessions,
documenting architecture, filters, sources, and conventions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:58:01 +01:00
27e5b05f88 Add Bazoš.cz as new apartment scraper source
New scraper for reality.bazos.cz with full HTML parsing (no API),
GPS extraction from Google Maps links, panel/sídliště filtering,
floor/area parsing from free text, and pagination fix for Bazoš's
numeric locality codes. Integrated into merge pipeline and map
with purple (#7B1FA2) markers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:47:37 +01:00
63663e4b6b Merge pull request 'Move Realingo scraper to run last' (#6) from fix/scraper-order into main
All checks were successful
Build and Push / build (push) Successful in 6s
Reviewed-on: #6
0.11
2026-02-27 21:19:29 +00:00
8c052840cd Move Realingo scraper to run last in pipeline
Reorder scrapers: Sreality → Bezrealitky → iDNES → PSN+CityHome → Realingo → Merge

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 21:35:54 +01:00
39e4b9ce2a Merge pull request 'Reliability improvements and cleanup' (#5) from improve/reliability-and-fixes into main
Reviewed-on: #5
2026-02-27 10:26:04 +00:00
Jan Novak
fd3991f8d6 Remove regen_map.py references from Dockerfile and README
All checks were successful
Build and Push / build (push) Successful in 6s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.10
2026-02-27 10:44:08 +01:00
Jan Novak
27a7834eb6 Reliability improvements: retry logic, validation, ratings sync
Some checks failed
Build and Push / build (push) Failing after 4s
- Add 3-attempt retry with exponential backoff to Sreality, Realingo,
  Bezrealitky, and PSN scrapers (CityHome and iDNES already had it)
- Add shared validate_listing() in scraper_stats.py; all 6 scrapers now
  validate GPS bounds, price, area, and required fields before output
- Wire ratings to server /api/ratings on page load (merge with
  localStorage) and save (async POST); ratings now persist across
  browsers and devices
- Namespace JS hash IDs as {source}_{id} to prevent rating collisions
  between listings from different portals with the same numeric ID
- Replace manual Czech diacritic table with unicodedata.normalize()
  in merge_and_map.py for correct deduplication of all edge cases
- Correct README schedule docs: every 4 hours, not twice daily

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.09
2026-02-27 10:36:37 +01:00
57a9f6f21a Add NEW badge for recent listings, text input for price filter, cleanup
- New listings (≤1 day) show yellow NEW badge instead of oversized marker
- Price filter changed from dropdown to text input (max 14M)
- Cap price filter at 14M in JS
- Remove unused regen_map.py
- Remove unused HTMLParser import in scrape_idnes.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:14:48 +01:00
0ea31d3013 Remove tracked generated/data files and fix map link on status page
- Remove byty_*.json, mapa_bytu.html, .DS_Store and settings.local.json from git tracking
  (already in .gitignore, files kept locally)
- Fix "Otevřít mapu" link on scraper status page: / → /mapa_bytu.html

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:42:35 +01:00
Jan Novak
4304a42776 Track first_seen/last_changed per property, add map filters and clickable legend
All checks were successful
Build and Push / build (push) Successful in 6s
Scraper changes (all 6 sources):
- Add first_seen: date the hash_id was first scraped, never overwritten
- Add last_changed: date the price last changed (= first_seen when new)
- PSN and CityHome load previous output as a lightweight cache to compute these fields
- merge_and_map.py preserves earliest first_seen when deduplicating cross-source duplicates

Map popup:
- Show "Přidáno: YYYY-MM-DD" and "Změněno: YYYY-MM-DD" in each property popup
- NOVÉ badge and pulsing marker now driven by first_seen == today (more accurate than scraped_at)

Map filters (sidebar):
- New "Přidáno / změněno" dropdown: 1, 2, 3, 4, 5, 7, 14, 30 days or all
- Clickable price/m² legend bands: click to filter to that band, multi-select supported
- "✕ Zobrazit všechny ceny" reset link appears when any band is active

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.08
2026-02-26 16:58:46 +01:00
23d208a5b7 Merge pull request 'Add scraper status collection and presentation' (#3) from add-scraper-statuses into main
Reviewed-on: #3
2026-02-26 09:04:23 +00:00
Jan Novak
00c9144010 Fix DATA_DIR usage in stats/history paths, set env in Dockerfile, add validation docs
All checks were successful
Build and Push / build (push) Successful in 5s
- scraper_stats.py: respect DATA_DIR env var when writing stats_*.json files
- generate_status.py: read stats files and write history from DATA_DIR instead of HERE
- build/Dockerfile: set DATA_DIR=/app/data as default env var
- docs/validation.md: end-to-end Docker validation recipe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.07
2026-02-26 09:46:16 +01:00
Jan Novak
44c02b45b4 Increase history retention to 20, run scrapers every 4 hours
All checks were successful
Build and Push / build (push) Successful in 7s
- generate_status.py: raise --keep default from 5 to 20 entries
- build/crontab: change schedule from 06:00/18:00 to every 4 hours (*/4)
  covers 6 runs/day ≈ 3.3 days of history at default retention

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.06
2026-02-26 08:53:27 +01:00
Jan Novak
5fb3b984b6 Add status dashboard, server, scraper stats, and DATA_DIR support
All checks were successful
Build and Push / build (push) Successful in 7s
Key changes:
- Replace ratings_server.py + status.html with a unified server.py that
  serves the map, scraper status dashboard, and ratings API in one process
- Add scraper_stats.py utility: each scraper writes per-run stats (fetched,
  accepted, excluded, duration) to stats_<source>.json for the status page
- generate_status.py: respect DATA_DIR env var so status.json lands in the
  configured data directory instead of always the project root
- run_all.sh: replace the {"status":"running"} overwrite of status.json with
  a dedicated scraper_running.json lock file; trap on EXIT ensures cleanup
  even on kill/error, preventing the previous run's results from being wiped
- server.py: detect running state via scraper_running.json existence instead
  of status["status"] field, eliminating the dual-use race condition
- Makefile: add serve (local dev), debug (Docker debug container) targets;
  add SERVER_PORT variable
- build/Dockerfile + entrypoint.sh: switch to server.py, set DATA_DIR,
  adjust volume mounts
- .gitignore: add *.json and *.log to keep runtime data files out of VCS
- mapa_bytu.html: price-per-m² colouring, status link, UX tweaks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.05
2026-02-26 00:30:25 +01:00
6f49533c94 Merge pull request 'Rewrite PSN + CityHome scrapers, add price/m² map coloring, ratings system, and status dashboard' (#2) from ui-tweaks/2026-02-17 into main
Reviewed-on: #2
2026-02-25 21:26:51 +00:00
b8d4d44164 Rewrite PSN + CityHome scrapers, add price/m² map coloring, ratings system, and status dashboard
- Rewrite PSN scraper to use /api/units-list endpoint (single API call, no HTML parsing)
- Fix CityHome scraper: GPS from multiple URL patterns, address from table cells, no 404 retries
- Color map markers by price/m² instead of disposition (blue→green→orange→red scale)
- Add persistent rating system (favorite/reject) with Flask ratings server and localStorage fallback
- Rejected markers show original color at reduced opacity with 🚫 SVG overlay
- Favorite markers shown as  star icons with gold pulse animation
- Add "new today" marker logic (scraped_at == today) with larger pulsing green outline
- Add filter panel with floor, price, hide-rejected controls and ☰/✕ toggle buttons
- Add generate_status.py for scraper run statistics and status.html dashboard
- Add scraped_at field to all scrapers for freshness tracking
- Update run_all.sh with log capture and status generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 15:15:25 +01:00
Jan Novak
c6089f0da9 Add Gitea Actions CI pipeline for Docker image builds
Triggers on tag push or manual dispatch. Builds the image using
build/Dockerfile and pushes to the Gitea container registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:40:23 +00:00
Jan Novak
327688d9d2 Add comprehensive project documentation
Cover the full pipeline (scrapers, merge, map generation), all 6 data
sources with their parsing methods, filter criteria, CLI arguments,
Docker setup, caching, rate limiting, and project structure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:40:23 +00:00
Jan Novak
09a853aa05 Add validation mode, structured logging, and CLI args to all scrapers
- Replace print() with Python logging module across all 6 scrapers
  for configurable log levels (DEBUG/INFO/WARNING/ERROR)
- Add --max-pages, --max-properties, and --log-level CLI arguments
  to each scraper via argparse for limiting scrape scope
- Add validation Make targets (validation, validation-local,
  validation-local-debug) for quick test runs with limited data
- Update run_all.sh to parse and forward CLI args to all scrapers
- Update mapa_bytu.html with latest scrape results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:40:23 +00:00
Jan Novak
5207c48890 add docker build, makefile, and some more shit before we move forward 2026-02-14 22:40:23 +00:00
215b51aadb Initial commit from Mac 2026-02-14 17:50:42 +01:00
846d0bd9f2 Upload files to "/"
v1 scrapery
2026-02-13 16:11:28 +00:00
82d1f94104 v1 2026-02-13 16:10:14 +00:00