Why We Moved an AudioLLM to Megatron
Mar 20 · 11 min read · We trained our 10B-parameter AudioLLM — a Whisper speech encoder fused with a Gemma2 9B text decoder — using Megatron with Mosaic Streaming to handle training data.
The wall
The architecture is a Whis