Files
fuj-management/prompts/outcomes/2026-03-10-cache-data-from-google-sheets.md

3.5 KiB

Google Sheets Data Caching Implementation

Date: 2026-03-11 Objective: Optimize Flask application performance by heavily caching expensive Google Sheets data processing, avoiding redundant HTTP roundtrips to Google APIs, and ensuring rate limits are not exhausted during simple web app reloads.

Implemented Features

1. File-Based JSON Caching (cache_utils.py)

  • Mechanism: Implemented a new generic caching system that saves API responses and heavily calculated datasets as .json files directly to the local /tmp/ directory.
  • Drive Metadata Checks: The cache is validated by asking the Google Drive API (drive.files().get) for the remote modifiedTime of the target Sheet.
  • Cache Hit logic: If the cached version on disk matches the remote modifiedTime, the application skips downloading the full CSV payload and computing tuples—instead serving the instant static cache via json.load.

2. Global API Auth Object Reuse

  • The Problem: The _get_drive_service() and get_sheets_service() implementations were completely rebuilding googleapiclient.discovery objects for every single file check—re-seeking and exchanging Google Service Account tokens constantly.
  • The Fix: Service objects (_DRIVE_SERVICE, _SHEETS_SERVICE) are now globally cached in application memory. The server authenticates exactly once when it wakes up, dramatically saving milliseconds and network resources across every web request. The underlying httplib2 and google-auth intelligently handle silent token refreshes natively.

3. Graceful Configurable Rate Limiting

  • In-Memory Debouncing: Implemented an internal memory state (_LAST_CHECKED) inside cache_utils that forcefully prevents checking the Drive API modifiedTime for a specific file if we already explicitly checked it within the last 5 minutes. This prevents flooding the Google Drive API while clicking wildly around the app GUI.
  • Semantic Mappings: Created a CACHE_SHEET_MAP that maps friendly internal cache keys (e.g. attendance_regular) back to their raw 44-character Google Sheet IDs.

4. HTTP / Socket Timeout Safety Fix

  • The Bug: Originally, socket.setdefaulttimeout(10) was used to prevent Google Drive metadata checks from locking up the worker pool. However, this brutally mutated the underlying Werkzeug/Flask default sockets globally. If fetching thousands of lines from Google Sheets (the payload logic) took longer than 10 seconds, Flask would just kill the request with a random TimeoutError('timed out').
  • The Fix: Removed the global mutation. Instantiated a targeted, isolated httplib2.Http(timeout=10) injected specifically into only the Google Drive API build. The rest of the app can now download massive files without randomly timing out.

5. Developer Experience (DX) Enhancements

  • Logging Line Origins: Enriched the console logging format strings (logging.basicConfig) to output [%(funcName)s] and %(filename)s:%(lineno)d to easily trace exactly which exact file and function is executing on complex stack traces.
  • Improved VS Code Local Debugging:
    • Integrated debugpy launch profiles in .vscode/launch.json for "Python Debugger: Flask" (Launching) and "Python Debugger: Attach" (Connecting).
    • Implemented a standard make web-attach target inside the Makefile via uv run python -m debugpy --listen ... to allow the background web app to automatically halt and wait for external debuggers before bootstrapping caching layers.