From Loss=36 to Convergence: Integrating Whisper+Gemma2 into Megatron's TransformerEngine
From Loss=36 to Convergence: Integrating Whisper+Gemma2 into Megatron's TransformerEngine
When we started debugging our AudioLLM on the Megatron trainer, our loss started at 36. This did not make sens
cliolabs.hashnode.dev9 min read