Feed
Pro
Search

Author

Write
Drafts

Bug0 - The AI-native e2e QA regression testing Passmark - The open-source AI framework for regression testing Hackathons Changelog Brand Hashnode gql skill - let your AI agent publish to your Hashnode blog The Foreword by Hashnode - official blog from the Hashnode team @hashnode on X Hashnode on LinkedIn Support - hello+support@hashnode.com Code of Conduct Terms Privacy Sitemap
Sign in

Search Hashnode

Search posts, tags, users, and pages

FeedDiscussion

Jun Bae

Developer

Jan 16

KV Cache and Prompt Caching: How to Leverage them to Cut Time and Costs

Introduction A Problem of LLM Inference In the transformer structure, the model calculates the \(\mathbf{K}, \mathbf{V}\) matrices using weight matrices \(\mathbf{W}\). When an input \(\mathbf{x}_0\) vector enters the model, it is first multiplied by...

sjun.hashnode.dev10 min read

#llm #machinelearning #generative-ai #optimization

Responses

No responses yet.