Been there with Lambda concurrency limits, same lesson. Unbounded concurrency sounds free until you hit memory walls or resource exhaustion.
Worker pool is the fix, yeah. But honestly, the real win is understanding your actual limits upfront. With Kafka at scale, I'd sketch out: messages/sec * avg processing time = concurrent workers needed. Then cap it hard.
The switch to pooling also forces you to think about backpressure. Queue backs up, that's data telling you something. Better than silent OOMKill.