Discussion

vedant

Deep Learning Enthusiast

Jun 25, 2025

From Thresholds to Probabilities

In the previous post, we looked at Softmax and NLL loss, both critical for output interpretation and learning in Transformers. Now let’s dive into what happens within the network: activation functions. Specifically, GELU. What is GeLU? GeLU, or, Gau...

gradientlore.hashnode.dev4 min read

#deep-learning #mathematics #probability-distributions #ai #llm

Responses

No responses yet.

Search Hashnode

From Thresholds to Probabilities

Responses

Recent in Forum