Comment by AskReal on "Harness Engineering: The Part of Agentic AI Nobody Writes About"

This is the conversation that actually needs to happen. Everyone optimizes the 20% (model + prompt) and ignores the 80% (harness) — and then wonders why their agent works in demos but fails in production.

The harness is where all the real decisions live: how context is managed across turns, when to stop and ask for human input, how errors surface, what gets retried vs. escalated. These aren't model problems, they're systems design problems. And most teams aren't treating them that way.

What I've seen break repeatedly in enterprise deployments:

Context window mismanagement (stuffing too much, losing critical state)
No graceful degradation when a tool call fails mid-chain
Approval gates that exist on paper but never actually halt execution
Logging that tells you what happened but not why the agent made a particular branch decision

The harness is also where you encode domain judgment — what the agent is allowed to do autonomously vs. what requires a human decision. Get that boundary wrong and you either build an agent that's too timid to be useful or one that takes consequential actions no one intended to delegate.

Solid framing. This deserves more attention than another "which model wins the benchmark" post.

Search Hashnode