This connects to something I keep running into with health software: the default architecture often decides the trust model before the user ever gets a real choice.
AI is very good at the happy path. It generates auth flows, CRUD shapes, and readable error messages. Where it consistently underperforms is at the failure edge: what happens to the user when the database is unavailable, when the session expires mid-action, when the export fails silently, when recovery requires contacting support.
For apps handling sensitive data, those failure modes are not edge cases. They are the moment the architecture either protects the user or transfers cost onto them.
A lot of teams treat "stable" as equivalent to "works in demos." The real test is: what does the app do to a vulnerable user when it fails? That is a question AI tooling almost never gets asked to answer.