Dynamic KV Cache compression based on vLLM framework
Motivation
By reviewing recent academic papers from the past year in the field of KV sparsity (H2O、SnapKV、PyramidKV), we apply KV sparsity to different layers of the model. By employing a pruning strategy, we eliminate KV pairs with lower scores whil...
novita.hashnode.dev4 min read