Tucker Attention: GQA, MLA, and MHA Were the Same Thing All Along
For the last two years, the LLM inference community has been playing a game of architectural bingo. Multi-Head Attention (MHA)? Too expensive at scale. Grouped-Query Attention (GQA)? Better KV cache, but you lose expressiveness. Multi-Head Latent Att...
theagentstack.hashnode.dev7 min read