This really resonates, especially the point about prompts describing behavior but not enforcing it.
I have seen a few teams try to solve workflow problems by continuously refining prompts, and it works right up until there's a partial failure or a workflow that needs to resume mid-execution. That's usually when the distinction between "the model should do this" and "the system guarantees this" becomes very obvious.
The queue processing example is a good one because the output can look completely reasonable even when the workflow itself has failed. Those are often the hardest issues to catch because nothing technically crashes.
The line about LangGraph becoming the owner of completion rules rather than the model is probably the clearest explanation of when to introduce it that I have seen.