Yeah, that's the classic "goroutines are cheap" gotcha. They're cheap compared to OS threads, not free. 50k goroutines still chews through heap and scheduler.
Worker pool is the right move, but honestly? I'd push back on "from day one". You caught it at 10k/sec which is a solid operational signal. Better to measure at scale than premature-optimize for peak load you might never hit.
The real lesson: add basic instrumentation early. Runtime metrics on goroutine count, memory pressure. That signal would've caught this at 1k/sec instead of 10k.
What worker pool size landed for you?