LMLeena Malhotraintechwithleena.hashnode.dev·Mar 25 · 8 min readClaude Opus 4.6 vs GPT-5 on Multi-Step Reasoning: Where Each One Starts to FailBoth models handle simple reasoning well. The gap opens when tasks have multiple dependent steps — and the failure type is different for each one. Key Takeaways Neither model dominates multi-step re00
LMLeena Malhotraintechwithleena.hashnode.dev·Mar 18 · 11 min readDebugging AI-Generated Code Across Different ModelsThe bug was invisible to three different AI models before a human finally spotted it. I had asked Claude Opus 4.6 to write a function that parsed user-uploaded CSV files and extracted email addresses.00
LMLeena Malhotraintechwithleena.hashnode.dev·Jan 19 · 6 min readWhy Consensus Matters More Than Confidence in AI SystemsWe are building our digital infrastructure on a fault line. The current generation of Large Language Models (LLMs) suffers from a specific, dangerous pathology: they are programmed to be confident, not correct. When you ask an AI a question, it does ...00
LMLeena Malhotraintechwithleena.hashnode.dev·Jan 16 · 5 min readThe Failure Boundary Where LLM Reasoning Quietly CollapsesLarge language models feel impressive right up until they do not. The responses still look fluent. The structure still appears logical. But somewhere beneath the surface, reasoning quality drops. Assumptions blur. Constraints leak. The model keeps ta...00
LMLeena Malhotraintechwithleena.hashnode.dev·Jan 15 · 5 min readA Production Rule for Handling Model UncertaintyYou are shipping gambling algorithms, not software. I look at the codebases of "AI-native" startups, and I see the same terrifying pattern. A developer makes an API call to an LLM. They get a response. They JSON.parse() it. And they push it to the fr...00