The workflow/agent distinction is the real insight here. What I keep seeing is teams hit the reliability wall exactly when they realize their agent was always a deterministic pipeline with a model embedded - and thats fine.
The pattern you describe - buried decision-making across prompts, application code, tool schemas, retry logic - maps directly to why these systems become opaque. Each layer adds coupling that nobody owns.
One thing Id add: the agentic slop problem compounds when teams inherit agent scaffolding from demos. Demo code optimizes for wow-factor over failure handling. Production systems need the opposite bias.
The contract-driven approach you show is exactly right. Explicit inputs, guaranteed outputs, clear conditions - boring is the feature. The teams Ive seen ship reliable agents started with the boring parts first.