Tag feed

#latency

105 posts2 followers

Explore Hashnode

Alternatives

Trending tags this week

Mmarcuschenvoicelatency.hashnode.dev1d ago · 17 min read

Ten days before launch, our voice agent kept cutting users off: an end-of-turn detection war story

TL;DR. Our phone voice agent kept interrupting people. We had shipped end-of-turn detection as a single silence timeout: if the caller went quiet for 700 milliseconds, the agent decided they were fini

0

ABAniruddha Banerjeeruddhani.hashnode.devJul 11 · 7 min read

When Data Becomes the Bottleneck: Unmasking the Real Culprit Behind SLT Misses

We kept blaming the data. Turns out, the data was innocent. 🔍 For weeks, our SLTs were being missed. Query timeouts. Stale dashboards. Frustrated users. Every indicator pointed at the data layer —

0

OOmnithiumomnithium.hashnode.devJun 28 · 19 min read

The True Cost of Multi-Agent Coordination: Beyond LLM Tokens

The Token Cost Mirage LLM token pricing is a mirage. The real cost of multi-agent systems lives in coordination latency, state management, debugging, and reliability engineering. You’ve seen the procu

0

NMNikhil Malinm-blogs.hashnode.devMay 10 · 9 min read

Why Node.js is Perfect for Building Fast Web Applications

Let’s address the elephant in the room right away. If you are coming to backend development from a traditional computer science background, the fundamental concept of Node.js sounds like a terrible id

0

DADavid Aronchickdistributedthoughts.orgMay 5 · 4 min read

The History of Expanso (Part 3): You Can't Change The Laws of Physics (Much)

The below is a continuation of the series on the history of Expanso. Today, we're talking about the three unchangeable laws of data - it's exponential growth, the speed of light, and global regulations. Read the whole series starting from Part 1. The...

0

DADavid Aronchickdistributedthoughts.orgMay 5 · 4 min read

The History of Expanso (Part 5): The Unbreakable Speed Limit

The below is a continuation of the series on the history of Expanso. Today, we're talking about the second of the three unchangeable laws of data - the speed limit. Read the whole series starting from Part 1. The History of Expanso (Part 5): The Unbr...

0

AKAnup Karanjkarwowhow.hashnode.devMay 2 · 3 min read

Mercury 2 vs Claude vs GPT: The Speed vs Quality Tradeoff

Mercury 2 from Inception Labs made headlines by being the fastest large language model in the world. Generating tokens at 500+ tokens per second — 5x faster than Claude Sonnet and 10x faster than Claude Opus. Speed is impressive. But speed without qu...

0

Xxiaoqiangapixiaoqiangapi.hashnode.devApr 27 · 8 min read

"Three Chinese LLMS Overseas Latency Tests: DeepSeek 1.45 Seconds is the Fastest. Do You Have a better Testing Method?"

This is how I measure API overseas latency (taking my Chinese LLM conversion as an example) : 1.45 seconds initial response, do you have a better way? 🎯 ** are you confused too? ** 1.Is it fast to c

1

X

AKAdam Kingflexai.hashnode.devApr 14 · 5 min read

LLM Inference GPU Sizing: How to Choose the Right GPU for Your Model and Traffic

When developers scale LLM workloads to production, one question always comes up: which GPUs should I use, how many will I need, and how much is this going to cost me? Not a back-of-the-envelope guess

0

RSRahul Sehrawatai-zero-to-hero.hashnode.devApr 13 · 12 min read

Three Kinds of Caching: Prompt, Semantic, Result

Every "AI app optimisation" post tells you to cache. None of them tell you which cache. There are at least three distinct caches that could live in an LLM pipeline, and they win in different places, stack in different orders, and fail in different wa...

0

#latency

Search Hashnode

#latency

Explore Hashnode

Trending tags this week

Ten days before launch, our voice agent kept cutting users off: an end-of-turn detection war story

When Data Becomes the Bottleneck: Unmasking the Real Culprit Behind SLT Misses

The True Cost of Multi-Agent Coordination: Beyond LLM Tokens

Why Node.js is Perfect for Building Fast Web Applications

The History of Expanso (Part 3): You Can't Change The Laws of Physics (Much)

The History of Expanso (Part 5): The Unbreakable Speed Limit

Mercury 2 vs Claude vs GPT: The Speed vs Quality Tradeoff

"Three Chinese LLMS Overseas Latency Tests: DeepSeek 1.45 Seconds is the Fastest. Do You Have a better Testing Method?"

LLM Inference GPU Sizing: How to Choose the Right GPU for Your Model and Traffic

Three Kinds of Caching: Prompt, Semantic, Result