Semantic caching
Large language models are getting faster and cheaper. The below charts show progress in OpenAI's GPT family of models over the past year:
Cost per million tokens ($)
Tokens per second
Recent releases like Meta's Llama 3 and Gemini Flash have pushed...
unkey.hashnode.dev2 min read