Vvedantingradientlore.hashnode.dev·Jun 25, 2025 · 4 min readFrom Thresholds to ProbabilitiesIn the previous post, we looked at Softmax and NLL loss, both critical for output interpretation and learning in Transformers. Now let’s dive into what happens within the network: activation functions. Specifically, GELU. What is GeLU? GeLU, or, Gau...00
Vvedantingradientlore.hashnode.dev·May 30, 2025 · 6 min readLogits and LikelihoodsIn the heart of modern neural networks capable of understanding the sentence semantics and generating thousands of words per second, Transformers, as we call them, lie 2 core mathematical operations that often go unnoticed, namely the Softmax and Neg...00