Why We Moved an AudioLLM to Megatron
We trained our 10B-parameter AudioLLM — a Whisper speech encoder fused with a Gemma2 9B text decoder — using Megatron with Mosaic Streaming to handle training data.
The wall
The architecture is a Whis
cliolabs.hashnode.dev11 min read