The MDS Shim — Zero-Conversion Data Loading for 800+ Datasets
Mar 27 · 12 min read · We have about 800 datasets in Mosaic MDS format, with tens of millions of multimodal samples — each one an audio clip, an instruction, and a target response — spread across thousands of compressed sha