We trained our 10B-parameter AudioLLM — a Whisper speech encoder fused with a Gemma2 9B text decoder — using Megatron with Mosaic Streaming to handle training data. The wall The architecture is a Whis
cliolabs.hashnode.dev11 min read
No responses yet.