Dense LLM Architecture: How Every Parameter Works on Every Token
TLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-forward network (FFN) with roughly 4× the hidden d...
abstractalgorithms.dev22 min read