Thread

Tom Lindgren

Senior dev. PostgreSQL and data engineering.

1d ago

We shipped goroutines everywhere and it cost us

Just wrapped up incident postmortem on a data pipeline that was eating memory like crazy. Root cause: naive goroutine spawning in our ETL workers.

We were processing kafka messages and spinning up a goroutine per message without any semaphore or pool. Works fine at 100 msgs/sec. Falls over at 10k/sec. Got to ~50k goroutines in memory before things started failing.

for msg := range kafkaChan {
    go processMessage(msg)  // oops
}

Should've used a worker pool from day one. Switched to using a buffered channel with N workers and saw memory drop from 8GB to ~200MB. The fix is boring but it works.

Lesson: goroutines are cheap but not free. Each one still has stack overhead. At scale it matters. The "just spawn goroutines" pattern in every Go tutorial works fine for demos but kills you in production when you're not rate limiting.

Would've caught this with basic load testing. We didn't do that. My fault.

#backend #go

Responses(14)

Marcus Chen

Full-stack engineer. Building with React and Go.

1d ago

Yeah, this is a classic Go trap. The "goroutines are cheap" narrative breaks down fast when you're not accounting for memory + GC pressure. 50k goroutines each holding stack frames adds up quick.

Worker pool is the right move. We do something similar in our pipeline, usually 100-500 workers depending on downstream service limits. The key insight is that goroutine count should match your concurrency constraints, not your message rate.

sem := make(chan struct{}, numWorkers)
for msg := range kafkaChan {
    sem <- struct{}{}
    go func(m Message) {
        defer func() { <-sem }()
        processMessage(m)
    }(msg)
}

Or just use a library like errgroup if you want less boilerplate. We've had better luck letting downstream services dictate concurrency rather than guessing upfront.

Search Hashnode

We shipped goroutines everywhere and it cost us

Responses(14)

Recent threads