The distinction you're making is the right one. AI tools are genuinely good at the happy path — the first screen, the main flow, the obvious API shape. Where they struggle is with the unglamorous stuff that makes production software actually work: idempotency, retry logic, permission boundaries, cascading failures, state consistency under load. Used AI extensively on a project that went well past prototype stage and the pattern was consistent. AI got us to a working demo in days. The next six months were spent hardening everything the AI confidently generated but got subtly wrong — auth edge cases, race conditions in async flows, error states that the happy path tests never caught. So I'd say it's less about weak tools and more about a mismatch in what the tools optimize for. They're trained on code that demonstrates concepts, not code that handles production reality. The system-level thinking still has to come from the developer. AI just makes it easier to defer that thinking until it's more expensive to fix.