Interesting breakdown. I’ve noticed the same pattern while experimenting with a few of these tools. They’re amazing for getting something working quickly, especially for prototypes or small utilities, but once you move toward production the real challenge becomes reliability and testing. The speed of generating code has definitely outpaced how we validate it. I’ve had cases where everything looked fine during development but weird edge cases started showing up once real users interacted with it. Feels like the workflow now is less about writing code and more about verifying what the AI produced. I’m curious too what tools or practices people are using to close that testing gap, because that seems like the next big bottleneck.