Add first_seen/last_updated timestamps to track property freshness
Each property record now carries two date fields:
- first_seen: date the listing first appeared (preserved across runs)
- last_updated: date of the most recent scrape that included it
All 6 scrapers (Sreality, Realingo, Bezrealitky, iDNES, PSN, CityHome)
set these fields during scraping. Cached results preserve first_seen and
refresh last_updated. PSN and CityHome gain a load_previous() helper to
track first_seen across runs (they lacked caching before).
The merge script keeps the earliest first_seen and latest last_updated
when deduplicating listings across sources.
The HTML map now shows dates in popups ("Přidáno: DD.MM.YYYY"), displays
a green "NOVÉ" badge on newly discovered listings, and adds a "Přidáno"
dropdown filter (24h / 3 days / 7 days / 14 days) for spotting new ones.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -79,6 +79,19 @@ def main():
|
||||
if key in seen_keys:
|
||||
dupes += 1
|
||||
existing = seen_keys[key]
|
||||
# Merge timestamps: keep earliest first_seen, latest last_updated
|
||||
e_first = e.get("first_seen", "")
|
||||
ex_first = existing.get("first_seen", "")
|
||||
if e_first and ex_first:
|
||||
existing["first_seen"] = min(e_first, ex_first)
|
||||
elif e_first:
|
||||
existing["first_seen"] = e_first
|
||||
e_updated = e.get("last_updated", "")
|
||||
ex_updated = existing.get("last_updated", "")
|
||||
if e_updated and ex_updated:
|
||||
existing["last_updated"] = max(e_updated, ex_updated)
|
||||
elif e_updated:
|
||||
existing["last_updated"] = e_updated
|
||||
# Log it
|
||||
print(f" Duplikát: {e['locality']} | {format_price(e['price'])} | {e.get('area', '?')} m² "
|
||||
f"({e.get('source', '?')} vs {existing.get('source', '?')})")
|
||||
|
||||
Reference in New Issue
Block a user