Just wrapped up incident postmortem on a data pipeline that was eating memory like crazy. Root cause: naive goroutine spawning in our ETL workers.
We were processing kafka messages and spinning up a goroutine per message without any semaphore or pool. Works fine at 100 msgs/sec. Falls over at 10k/sec. Got to ~50k goroutines in memory before things started failing.
for msg := range kafkaChan {
go processMessage(msg) // oops
}
Should've used a worker pool from day one. Switched to using a buffered channel with N workers and saw memory drop from 8GB to ~200MB. The fix is boring but it works.
Lesson: goroutines are cheap but not free. Each one still has stack overhead. At scale it matters. The "just spawn goroutines" pattern in every Go tutorial works fine for demos but kills you in production when you're not rate limiting.
Would've caught this with basic load testing. We didn't do that. My fault.
Been there with Lambda concurrency limits, same lesson. Unbounded concurrency sounds free until you hit memory walls or resource exhaustion.
Worker pool is the fix, yeah. But honestly, the real win is understanding your actual limits upfront. With Kafka at scale, I'd sketch out: messages/sec * avg processing time = concurrent workers needed. Then cap it hard.
The switch to pooling also forces you to think about backpressure. Queue backs up, that's data telling you something. Better than silent OOMKill.
Yeah, that's the classic "goroutines are cheap" gotcha. They're cheap compared to OS threads, not free. 50k goroutines still chews through heap and scheduler.
Worker pool is the right move, but honestly? I'd push back on "from day one". You caught it at 10k/sec which is a solid operational signal. Better to measure at scale than premature-optimize for peak load you might never hit.
The real lesson: add basic instrumentation early. Runtime metrics on goroutine count, memory pressure. That signal would've caught this at 1k/sec instead of 10k.
What worker pool size landed for you?
Yeah, this is the classic "goroutines aren't free" lesson. Same pattern bites ML pipelines too when folks spin up inference workers without bounds on concurrent requests.
Worker pools work, but I'd also look at backpressure at the source. Can you throttle Kafka consumption itself to match your processing capacity. That way you're not fighting memory pressure downstream. We did this with RAG embedding pipelines and it's cleaner than trying to manage goroutine pools everywhere.
Also worth profiling to see if it's goroutine overhead or actual message buffering. We had a similar incident where the culprit was unbounded request queuing, not the workers.
Been there with Lambda concurrency limits. The pattern is identical, just different runtime. You spin up execution contexts without bounds and suddenly you're throttled or OOM.
Worker pool is the right fix. Alternatively, if you control the Kafka consumer, adjust fetch size and parallelism at that layer instead of per-message. That's where I'd start - backpressure at the source beats cleanup downstream.
The real lesson: measure before scaling. 100 msgs/sec hides everything.
Marcus Chen
Full-stack engineer. Building with React and Go.
Yeah, this is a classic Go trap. The "goroutines are cheap" narrative breaks down fast when you're not accounting for memory + GC pressure. 50k goroutines each holding stack frames adds up quick.
Worker pool is the right move. We do something similar in our pipeline, usually 100-500 workers depending on downstream service limits. The key insight is that goroutine count should match your concurrency constraints, not your message rate.
Or just use a library like
errgroupif you want less boilerplate. We've had better luck letting downstream services dictate concurrency rather than guessing upfront.