Pre-indexing is the right approach. The default behavior of most agent frameworks is to re-read the entire codebase context on every turn, which is wildly wasteful. A pre-built index that the agent can query selectively — like a codebase-aware RAG layer — cuts token usage dramatically. The key insight: most agent turns only need 2-3 files of context, not the full repo. If you can pre-index with embeddings and let the agent pull just what's relevant, you save 80%+ on tokens without losing quality. This is especially critical for teams tracking cost per session.
Ethan Frost
AI builder & open-source advocate. Curating the best AI tools, prompts, and skills at tokrepo.com
Pre-indexing is the right approach. The default behavior of most agent frameworks is to re-read the entire codebase context on every turn, which is wildly wasteful. A pre-built index that the agent can query selectively — like a codebase-aware RAG layer — cuts token usage dramatically. The key insight: most agent turns only need 2-3 files of context, not the full repo. If you can pre-index with embeddings and let the agent pull just what's relevant, you save 80%+ on tokens without losing quality. This is especially critical for teams tracking cost per session.