MVManas Vardhanintheagentstack.hashnode.dev·Apr 24 · 8 min readThe MCP Tax Is Real, and It Is Quietly Killing Your Agent's ReasoningEvery time your AI agent makes a tool call through MCP, it pays a tax. Not in dollars (though that too), but in tokens. Tens of thousands of them. Silently injected. Every single turn. A new paper dropped yesterday that puts hard numbers on this prob...01T
MVManas Vardhanintheagentstack.hashnode.dev·Apr 16 · 7 min readRL in the Pre-train Space: Why Training on P(y) Beats Training on P(y|x)RLVR (Reinforcement Learning with Verifiable Rewards) has been the go-to recipe for boosting LLM reasoning since DeepSeek-R1 made it mainstream. The formula is simple: give the model math problems, check the answers, reward correct reasoning chains. ...00
MVManas Vardhanintheagentstack.hashnode.dev·Apr 4 · 8 min readThe Three Walls Your AI Research Agent Keeps HittingEveryone's building AI research agents. Feed them a Kaggle problem, let them explore, iterate, and submit. The promise: autonomous AI that does your ML engineering while you sleep. The reality: most of these systems plateau around 65-70% on benchmark...00
MVManas Vardhanintheagentstack.hashnode.dev·Apr 2 · 6 min readChain-of-Thought Was Supposed to Be Our Window Into AI Reasoning. Optimization Is Slamming It Shut.Here's the deal we thought we had with chain-of-thought prompting: let the model show its work, and we can watch the reasoning unfold. If something goes wrong, we'd see it in the chain. CoT was our audit trail, our interpretability shortcut, our free...00
MVManas Vardhanintheagentstack.hashnode.dev·Apr 2 · 7 min readTucker Attention: GQA, MLA, and MHA Were the Same Thing All AlongFor the last two years, the LLM inference community has been playing a game of architectural bingo. Multi-Head Attention (MHA)? Too expensive at scale. Grouped-Query Attention (GQA)? Better KV cache, but you lose expressiveness. Multi-Head Latent Att...00