Caching
The CLI uses a two-layer caching system to speed up reruns and reduce API costs. Results are cached locally so you don't pay twice for the same processing.
How It Works
The CLI maintains two types of cache:
| Layer | Purpose | What's Cached |
|---|---|---|
| Pipeline Cache | Per-run stage outputs | Parsed messages, candidates, classifications, geocoded results |
| API Cache | Deduplicate API calls | Embeddings, AI classifications, geocoding results, scraped URLs |
Cache Location
By default, the cache is stored in ~/.cache/chat-to-map/:
~/.cache/chat-to-map/
├── chats/ # Pipeline cache (per-run outputs)
│ └── WhatsApp_Chat/
│ └── 2025-01-15T10-30-45-abc123/ # datetime-filehash
│ ├── chat.txt
│ ├── messages.json
│ ├── candidates.heuristics.json
│ ├── classifications.json
│ └── ...
└── requests/ # API cache (response deduplication)
├── ai/openai/text-embedding-3-large/{hash}.json
├── ai/anthropic/claude-haiku-4-5/{hash}.json
├── geo/google/{hash}.json
└── web/https_example_com_{hash}.jsonCustom Cache Directory
You can change the cache location using an environment variable or command flag:
# Via environment variable
export CHAT_TO_MAP_CACHE_DIR="/custom/path"
# Or per-command
chat-to-map analyze chat.zip --cache-dir /tmp/cacheSkipping the Cache
To regenerate all results from scratch (useful if you want to re-run classification with different settings):
chat-to-map analyze chat.zip --no-cacheThis skips both layers of cache and makes fresh API calls. Note that this will incur additional API costs.
Cache Persistence
Both caches store entries forever — there's no automatic expiration or TTL. This is intentional: you shouldn't have to re-process a chat you already paid to analyze.
To manually clear the cache:
# Clear everything
rm -rf ~/.cache/chat-to-map
# Clear only API response cache (keep pipeline results)
rm -rf ~/.cache/chat-to-map/requests
# Clear only a specific chat's pipeline cache
rm -rf ~/.cache/chat-to-map/chats/WhatsApp_ChatPipeline Flow with Caching
Each CLI command checks for cached results from earlier stages. If available, it reuses them instead of re-running:
parse → scan → embed → filter → scrape-urls → classify → geocode → fetch-images → export
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
cached cached cached cached cached cached cached cachedFor example, if you run geocode on a chat you've already processed with classify, it will skip parsing, filtering, and classification — jumping straight
to geocoding the cached activities.
Viewing Cached Chats
Use the list command to see previously processed chats:
chat-to-map listThis shows all chats in your cache directory with their processing status and timestamps.
Cache Key Generation
Cache keys are deterministic SHA256 hashes. The same input always produces the same cache key, regardless of property ordering in objects. This ensures cache hits even if you pass options in a different order.
For URLs, the cache key includes a sanitized version of the URL plus a hash:
# URL: https://example.com/restaurant/joes-pizza
# Cache key: web/https_example_com_restaurant_joes_pizza_abc12345.jsonCost Savings
The caching system can significantly reduce costs when:
- Re-running commands with different output options
- Adding new export formats to an existing analysis
- Re-processing after fixing an error mid-pipeline
- Running multiple analyses on overlapping chat exports
The API cache deduplicates at the request level, so even different chat exports will share cached results if they contain the same URLs or locations.