From Loss=36 to Convergence: Integrating Whisper+Gemma2 into Megatron's TransformerEngine
1h ago · 9 min read · From Loss=36 to Convergence: Integrating Whisper+Gemma2 into Megatron's TransformerEngine When we started debugging our AudioLLM on the Megatron trainer, our loss started at 36. This did not make sens
Join discussion




















