Great point Archit, that tradeoff triangle is very real, I completely agree on query rewriting. Improving the query often gives more impact than upgrading embeddings especially when it comes to capturing intent correctly. Also +1 on cross-encoder reranking with caching in place, the extra latency is usually a fair trade for the improvement in precision. For chunking I’m using a hybrid (fixed-size 512t+10% overlap +semantic splitter) chunking. Fixed-size chunks tend to break context, especially in technical documentation. Curious how you’re handling caching for reranking. Are you caching at the query level or closer to the embeddings?
