This resonates. A lot of teams focus on the model and prompts, but reliability usually comes from everything around the model validation, observability, state management, fallback paths, and clear contracts between components.
The point about "a probabilistic component inside a deterministic system" is especially important. We've seen similar patterns at IT Path Solutions when moving AI agents from demos to production. The biggest gains rarely come from prompt tweaks; they come from building the scaffolding that keeps behavior predictable when real users, edge cases, and scale enter the picture.
Great reminder that production AI is ultimately a systems engineering challenge, not just a model selection challenge.