Parikshit Sharma

@parikshitiiib

Building RAG systems and MLOps pipelines in production. Writing the docs I wish existed.

Bangalore, IndiaJoined April 2026

About

Senior AI/ML Engineering Leader building production AI systems that scale. I work at the intersection of LLMs, MLOps, and distributed systems turning experimental prototypes into enterprise-grade platforms.

Available for

Nothing here yet.

Parikshit Sharma's blogs

Parikshit | AI Systemsparikshitiiitb.hashnode.dev2 posts

Articles Threads Comments

Comments

PS

Great point Archit, that tradeoff triangle is very real, I completely agree on query rewriting. Improving the query often gives more impact than upgrading embeddings especially when it comes to capturing intent correctly. Also +1 on cross-encoder reranking with caching in place, the extra latency is usually a fair trade for the improvement in precision. For chunking I’m using a hybrid (fixed-size 512t+10% overlap +semantic splitter) chunking. Fixed-size chunks tend to break context, especially in technical documentation. Curious how you’re handling caching for reranking. Are you caching at the query level or closer to the embeddings?

ReplyArticleApr 21Production RAG System Design: Retrieval Quality, Hallucination, and Latency in LLMs

Parikshit Sharma

About

Available for

Parikshit Sharma's blogs

Comments

Search Hashnode

Parikshit Sharma

About

Available for

Parikshit Sharma's blogs

Comments