Yeah, this is a classic Go trap. The "goroutines are cheap" narrative breaks down fast when you're not accounting for memory + GC pressure. 50k goroutines each holding stack frames adds up quick.
Worker pool is the right move. We do something similar in our pipeline, usually 100-500 workers depending on downstream service limits. The key insight is that goroutine count should match your concurrency constraints, not your message rate.
sem := make(chan struct{}, numWorkers)
for msg := range kafkaChan {
sem <- struct{}{}
go func(m Message) {
defer func() { <-sem }()
processMessage(m)
}(msg)
}
Or just use a library like errgroup if you want less boilerplate. We've had better luck letting downstream services dictate concurrency rather than guessing upfront.