3.5 KiB
3.5 KiB
Google Sheets Data Caching Implementation
Date: 2026-03-11 Objective: Optimize Flask application performance by heavily caching expensive Google Sheets data processing, avoiding redundant HTTP roundtrips to Google APIs, and ensuring rate limits are not exhausted during simple web app reloads.
Implemented Features
1. File-Based JSON Caching (cache_utils.py)
- Mechanism: Implemented a new generic caching system that saves API responses and heavily calculated datasets as
.jsonfiles directly to the local/tmp/directory. - Drive Metadata Checks: The cache is validated by asking the Google Drive API (
drive.files().get) for the remotemodifiedTimeof the target Sheet. - Cache Hit logic: If the cached version on disk matches the remote
modifiedTime, the application skips downloading the full CSV payload and computing tuples—instead serving the instant static cache viajson.load.
2. Global API Auth Object Reuse
- The Problem: The
_get_drive_service()andget_sheets_service()implementations were completely rebuildinggoogleapiclient.discoveryobjects for every single file check—re-seeking and exchanging Google Service Account tokens constantly. - The Fix: Service objects (
_DRIVE_SERVICE,_SHEETS_SERVICE) are now globally cached in application memory. The server authenticates exactly once when it wakes up, dramatically saving milliseconds and network resources across every web request. The underlyinghttplib2andgoogle-authintelligently handle silent token refreshes natively.
3. Graceful Configurable Rate Limiting
- In-Memory Debouncing: Implemented an internal memory state (
_LAST_CHECKED) insidecache_utilsthat forcefully prevents checking the Drive APImodifiedTimefor a specific file if we already explicitly checked it within the last 5 minutes. This prevents flooding the Google Drive API while clicking wildly around the app GUI. - Semantic Mappings: Created a
CACHE_SHEET_MAPthat maps friendly internal cache keys (e.g.attendance_regular) back to their raw 44-character Google Sheet IDs.
4. HTTP / Socket Timeout Safety Fix
- The Bug: Originally,
socket.setdefaulttimeout(10)was used to prevent Google Drive metadata checks from locking up the worker pool. However, this brutally mutated the underlying Werkzeug/Flask default sockets globally. If fetching thousands of lines from Google Sheets (the payload logic) took longer than 10 seconds, Flask would just kill the request with a randomTimeoutError('timed out'). - The Fix: Removed the global mutation. Instantiated a targeted, isolated
httplib2.Http(timeout=10)injected specifically into only the Google Drive API build. The rest of the app can now download massive files without randomly timing out.
5. Developer Experience (DX) Enhancements
- Logging Line Origins: Enriched the console logging format strings (
logging.basicConfig) to output[%(funcName)s]and%(filename)s:%(lineno)dto easily trace exactly which exact file and function is executing on complex stack traces. - Improved VS Code Local Debugging:
- Integrated
debugpylaunch profiles in.vscode/launch.jsonfor "Python Debugger: Flask" (Launching) and "Python Debugger: Attach" (Connecting). - Implemented a standard
make web-attachtarget inside the Makefile viauv run python -m debugpy --listen ...to allow the background web app to automatically halt and wait for external debuggers before bootstrapping caching layers.
- Integrated