Completely agree with the system-problem framing.
The pattern I keep seeing is that the model gets blamed for failures that really belong to the runtime: weak stop rules, unclear tool boundaries, or retries that keep going after the useful signal is gone.
Once you make the loop prove what changed before another attempt, a lot of the fake progress disappears.