Had the opposite experience. Built a metrics pipeline doing 100k+ requests/sec and yeah, unbounded goroutines killed us. Not memory - goroutine scheduling itself becomes the bottleneck. Scheduler starts thrashing around 50k concurrent goroutines.
The trick is you don't notice until production load. Local testing with 1k req/sec looks fine. Then you hit real traffic and p99 latencies crater.
Worker pools still matter, but you're right the boilerplate is worse than it needs to be. I just use a buffered channel and range over it. Context cancellation is the only thing that's genuinely annoying to wire up correctly.
Database being the real limit is true for CRUD apps. Not true if you're CPU-bound or doing heavy I/O coordination.