Dec 3, 2025 · 11 min read · 1. Why bother understanding this at all? If you’re a developer or founder, you don’t need to reinvent the math of deep learning. But you do need a solid mental model of: what modern LLMs really are, why they’re trained on GPUs/TPUs, how context wi...
Join discussion
Aug 1, 2025 · 47 min read · Welcome to a deep dive into one of the most critical and fascinating areas of AI Engineering: Inference Optimization. While building powerful models is one part of the equation, making them run efficiently—faster, cheaper, and at scale—is what makes ...
Join discussion
Feb 17, 2025 · 6 min read · In this article, we will discuss the KVCache (Key-Value Cache) which is an inference optimization technique. We will explore the problems of inference and decoder architecture of transformer models. Then we will explore the needs, and limitations of ...
Join discussion