May 5 · 4 min read · The below is a continuation of the series on the history of Expanso. Today, we're talking about the three unchangeable laws of data - it's exponential growth, the speed of light, and global regulations. Read the whole series starting from Part 1. The...
Join discussionMay 5 · 4 min read · The below is a continuation of the series on the history of Expanso. Today, we're talking about the second of the three unchangeable laws of data - the speed limit. Read the whole series starting from Part 1. The History of Expanso (Part 5): The Unbr...
Join discussionMay 2 · 3 min read · Mercury 2 from Inception Labs made headlines by being the fastest large language model in the world. Generating tokens at 500+ tokens per second — 5x faster than Claude Sonnet and 10x faster than Claude Opus. Speed is impressive. But speed without qu...
Join discussionApr 27 · 8 min read · This is how I measure API overseas latency (taking my Chinese LLM conversion as an example) : 1.45 seconds initial response, do you have a better way? 🎯 ** are you confused too? ** 1.Is it fast to c
Xxiaoqiangapi commentedApr 14 · 5 min read · When developers scale LLM workloads to production, one question always comes up: which GPUs should I use, how many will I need, and how much is this going to cost me? Not a back-of-the-envelope guess
Join discussionApr 13 · 12 min read · Every "AI app optimisation" post tells you to cache. None of them tell you which cache. There are at least three distinct caches that could live in an LLM pipeline, and they win in different places, stack in different orders, and fail in different wa...
Join discussionApr 13 · 10 min read · Welcome to Module B5 — Shipping. The module where we stop talking about what to build and start talking about what makes the difference between a demo that works on your laptop and a product that survives ten thousand users on a bad day. Cost. Latenc...
Join discussionMar 28 · 7 min read · Originally published at adiyogiarts.com Benchmark vLLM, TensorRT-LLM, and SGLang for LLM serving performance. Compare latency, throughput, and resource use to find optimal deployment strategies for Large Language Models. WHY IT MATTERS The Challeng...
Join discussion