PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Pyramidal KV‑Cache Allocation in Transformer Models Motivation and observed patterns Pyramidal Information Funneling is the phrase that neatly captures an empirical pattern the authors identify: attention appears to scatter broadly in early layers an...
paperium.hashnode.dev5 min read