I have been building an API that AI sales agents call to research a prospect. Send a person or a company, get back a briefing: company overview, decision-makers, likely pain points, opening plays.
I expected the hard part to be the research quality. It was not. The hard part was something far more boring and far more important: returning the exact same JSON shape on every single call.
Here is why that matters more than it sounds.
When a human uses your API and one field comes back null, or a list comes back as a string, or a key is missing this time, they shrug and handle it. They are in the loop. They adapt.
An autonomous agent is not in the loop. It calls you on lead 1, lead 50, lead 500, unattended, and feeds your output straight into the next step. If call 437 returns decision_makers as an object instead of an array, the agent does not shrug. It crashes, or worse, it silently does the wrong thing and keeps going. You find out days later when the outreach looks insane.
So the contract is not "return good data." The contract is "return identically shaped data, every time, no exceptions." Consistency is the product. The research is table stakes.
The interesting failures were never the obvious ones. A few that cost me real time:
LLM-shaped output drift. Part of the pipeline uses a model to synthesize the briefing. Models are probabilistic. Ask for "up to 3 opening plays" and sometimes you get 3, sometimes 4, sometimes a single string with bullets inside it. The fix was to stop trusting prose and force a strict schema, then validate hard and repair or drop anything that does not conform before it ever reaches the response.
Empty is not the same as missing. If a company has no recent news, is that field null, [], an absent key, or the string "No recent news found"? I had all four in the wild at one point. Every one of them breaks a different agent. Pick one normalization rule per field and enforce it in one place, not scattered across the code.
Partial data is worse than no data. When one upstream source times out, the tempting move is to return what you have. For an agent that is a trap, because now the shape is inconsistent depending on luck. Decide up front which fields are guaranteed and which are best-effort, and make the guaranteed ones actually guaranteed even if that means a slower call.
Treat the response shape as a hard contract and the content as best-effort within that contract. The shape never varies. The values can be empty, but the keys and their types never move. An agent can reason about "this field is empty today." It cannot reason about "this field exists on Tuesdays."
This is obvious in hindsight and I learned it the slow way. If you are building anything an autonomous agent will call in a loop, I would obsess over output stability before you obsess over output quality. The quality earns you the first call. The stability earns you the 500th.
I am building this in public. If you have shipped an API that agents hammer unattended, I would love to hear what shape-stability tricks you use, especially around LLM-in-the-pipeline endpoints. What broke for you that you did not see coming?
(If you want to poke at the thing I am describing, the sandbox is free: callprep.app/api. But I am mostly here for the war stories.)
No responses yet.