LWLewis Wonincliolabs.hashnode.dev00The MDS Shim — Zero-Conversion Data Loading for 800+ DatasetsMar 27 · 12 min read · We have about 800 datasets in Mosaic MDS format, with tens of millions of multimodal samples — each one an audio clip, an instruction, and a target response — spread across thousands of compressed shaJoin discussion
LWLewis Wonincliolabs.hashnode.dev00Why We Moved an AudioLLM to MegatronMar 20 · 11 min read · We trained our 10B-parameter AudioLLM — a Whisper speech encoder fused with a Gemma2 9B text decoder — using Megatron with Mosaic Streaming to handle training data. The wall The architecture is a WhisJoin discussion