100% agree — this matches what I see building automation systems for clients daily. The model is usually the most reliable part of the stack. What breaks first? State handling and context management, every time. Specifically: agents losing track of what they've already done in multi-step workflows, and context windows getting polluted with irrelevant tool outputs. The fix that's worked best for me is treating each agent step as a discrete function with explicit inputs/outputs rather than letting agents freestyle through a chain. Structured handoffs between steps, with validation at each boundary, catch most failures before they cascade. The "librarian problem" framing above is spot on.