How I cached intention, not queries
While building a project for a client recently, the system runs an LLM pipeline with multiple LLM calls. I ran into 2 obvious problems — latency and high token usage. I needed a way to kill both birds
precious-o.hashnode.dev3 min read