FeedDiscussion

NovitaAI

Deploy AI models effortlessly with our simple API. Build and scale on the most affordable, reliable GPU cloud.

Dec 14, 2024

Dynamic KV Cache compression based on vLLM framework

Motivation By reviewing recent academic papers from the past year in the field of KV sparsity (H2O、SnapKV、PyramidKV), we apply KV sparsity to different layers of the model. By employing a pruning strategy, we eliminate KV pairs with lower scores whil...

novita.hashnode.dev4 min read

#llm #framework

Responses

No responses yet.

Search Hashnode

Dynamic KV Cache compression based on vLLM framework

Responses