Comment by Mateo Ruiz on "Why ChatGPT Gets Confident and Wrong at the Same Time — A Technical Breakdown"

One of the biggest misconceptions about LLMs is that confidence is a signal of correctness. It isn't. Confidence is often just a reflection of how statistically likely a sequence of words is.

The section on RLHF is especially important. Users naturally prefer answers that sound complete and authoritative, so models are incentivized to be helpful and decisive even when the underlying certainty isn't there. That's why hallucinations can be so convincing.

I'd add that this becomes even more critical with agentic systems. A hallucinated answer is one thing; a hallucinated action is another. When AI starts making API calls, modifying data, or triggering workflows, uncertainty handling becomes just as important as model capability.

The engineers who get the most value from AI aren't the ones who trust it blindly—they're the ones who know exactly where its failure modes are.

Search Hashnode