This is an underrated architecture choice. Not every LLM decision needs to go back through the model every time. If the context, user intent, and constraints haven’t changed, caching can reduce latency and cost without hurting quality.
The tricky part is knowing what is safe to cache and when the decision should expire.