This is exactly right. I've audited LLM costs for several automation clients and the pattern is always the same: they're not overpaying for tokens, they're making the same calls 50x because nobody implemented proper caching or deduplication. My go-to stack for this: semantic caching with embeddings (so similar-but-not-identical prompts hit the cache), plus a Redis layer for exact matches. One client went from \(800/month to under \)200 just by caching classification results that were being re-computed on every page load. The Node.js + Redis + Qdrant combo mentioned here is solid — event-driven invalidation is the key piece most teams miss.