



14h ago · 5 min read · We’ve spent the last decade teaching machines to understand pixels, paragraphs, and polygons. But audio? We still treat it like a sealed container. You get one file. One mix. If you want just the bass
Join discussion
19h ago · 6 min read · Recently, I set out to reproduce a standard CNN baseline on CIFAR-10. It was supposed to be a warm-up. A "hello world" for my semester project. I wrote the code, grabbed the dataset, and queued up fou
Join discussion1d ago · 3 min read · Introduction As the field of machine learning continues to blossom, the intricacies of neural network performance must be scrutinized more closely than ever. One phenomenon that has gained attention is gradient misalignment, which can severely affect...
Join discussion3d ago · 9 min read · From Loss=36 to Convergence: Integrating Whisper+Gemma2 into Megatron's TransformerEngine When we started debugging our AudioLLM on the Megatron trainer, our loss started at 36. This did not make sens
Join discussion
Apr 23 · 4 min read · When I first started learning PyTorch, one concept felt surprisingly confusing: The training loop — especially forward pass, loss, backpropagation, and optimizer steps. Over time, I realized somethi
Join discussion
Apr 22 · 3 min read · Introduction In the world of machine learning, normalization is a crucial step that can significantly impact model performance and training speed. Traditional normalization techniques, while effective, can be inadequate for large datasets or complex ...
Join discussion