Caching

The CLI uses a two-layer caching system to speed up reruns and reduce API costs. Results are cached locally so you don't pay twice for the same processing.

How It Works

The CLI maintains two types of cache:

Layer	Purpose	What's Cached
Pipeline Cache	Per-run stage outputs	Parsed messages, candidates, classifications, geocoded results
API Cache	Deduplicate API calls	Embeddings, AI classifications, geocoding results, scraped URLs

Cache Location

By default, the cache is stored in ~/.cache/chat-to-map/:

~/.cache/chat-to-map/
├── chats/                              # Pipeline cache (per-run outputs)
│   └── WhatsApp_Chat/
│       └── 2025-01-15T10-30-45-abc123/ # datetime-filehash
│           ├── chat.txt
│           ├── messages.json
│           ├── candidates.heuristics.json
│           ├── classifications.json
│           └── ...
└── requests/                           # API cache (response deduplication)
    ├── ai/openai/text-embedding-3-large/{hash}.json
    ├── ai/anthropic/claude-haiku-4-5/{hash}.json
    ├── geo/google/{hash}.json
    └── web/https_example_com_{hash}.json

Custom Cache Directory

You can change the cache location using an environment variable or command flag:

# Via environment variable
export CHAT_TO_MAP_CACHE_DIR="/custom/path"

# Or per-command
chat-to-map analyze chat.zip --cache-dir /tmp/cache

Skipping the Cache

To regenerate all results from scratch (useful if you want to re-run classification with different settings):

chat-to-map analyze chat.zip --no-cache

This skips both layers of cache and makes fresh API calls. Note that this will incur additional API costs.

Cache Persistence

Both caches store entries forever — there's no automatic expiration or TTL. This is intentional: you shouldn't have to re-process a chat you already paid to analyze.

To manually clear the cache:

# Clear everything
rm -rf ~/.cache/chat-to-map

# Clear only API response cache (keep pipeline results)
rm -rf ~/.cache/chat-to-map/requests

# Clear only a specific chat's pipeline cache
rm -rf ~/.cache/chat-to-map/chats/WhatsApp_Chat

Pipeline Flow with Caching

Each CLI command checks for cached results from earlier stages. If available, it reuses them instead of re-running:

parse → scan → embed → filter → scrape-urls → classify → geocode → fetch-images → export
  ↓       ↓       ↓       ↓          ↓            ↓          ↓            ↓
cached  cached  cached  cached    cached       cached     cached       cached

For example, if you run geocode on a chat you've already processed with classify, it will skip parsing, filtering, and classification — jumping straight to geocoding the cached activities.

Viewing Cached Chats

Use the list command to see previously processed chats:

chat-to-map list

This shows all chats in your cache directory with their processing status and timestamps.

Cache Key Generation

Cache keys are deterministic SHA256 hashes. The same input always produces the same cache key, regardless of property ordering in objects. This ensures cache hits even if you pass options in a different order.

For URLs, the cache key includes a sanitized version of the URL plus a hash:

# URL: https://example.com/restaurant/joes-pizza
# Cache key: web/https_example_com_restaurant_joes_pizza_abc12345.json

Cost Savings

The caching system can significantly reduce costs when:

Re-running commands with different output options
Adding new export formats to an existing analysis
Re-processing after fixing an error mid-pipeline
Running multiple analyses on overlapping chat exports

The API cache deduplicates at the request level, so even different chat exports will share cached results if they contain the same URLs or locations.