Comment by Archit Mittal on "Most teams don't have an LLM cost problem. They have a redundancy problem."

The classification caching case is a great example. Repeated classification calls are the easiest wins because the input-output mapping is deterministic. No risk of serving stale "creative" responses, just consistent labels.

One thing we learned on the invalidation side: travel data needs aggressive TTLs on anything pricing-related (15 minutes max), but room-type descriptions and policy content can stay cached for 24+ hours. Splitting the cache by content type rather than one blanket TTL made a significant difference in the hit rate vs. freshness tradeoff.

\(800 to \)200 is roughly the same ratio we saw. Most of it comes before you even touch model selection.

Search Hashnode