mao mao

@maomao

Joined November 2025

About

Nothing here yet.

Available for

Nothing here yet.

mao mao's blogs

Bonita's Blogmaomao.hashnode.dev2 posts

Articles Threads Comments

Recently published

MMmao maomaomao.hashnode.dev

0

The KV Cache Dilemma: Why LLM Inference Needs to "Forget" to Scale?

Dec 10, 2025 · 3 min read · Have you ever wondered why the 10th turn of a conversation with an LLM feels just as fast as the first? Mathematically, this shouldn’t happen. As the context grows (History + New Question), the computation required to generate the next token should i...

Join discussion

MMmao maomaomao.hashnode.dev

0

How to Build MCP AI Model Conversion Server: A Complete Guide

Nov 3, 2025 · 7 min read · PurposeBuild a reusable service that takes only a Hugging Face model_id (plus optional tuning parameters) and produces optimized artifacts for multiple deployment targets. The service should be usable by humans, agents, or external tools (VS Code, Cl...

Join discussion

mao mao

About

Available for

mao mao's blogs

Recently published

The KV Cache Dilemma: Why LLM Inference Needs to "Forget" to Scale?

How to Build MCP AI Model Conversion Server: A Complete Guide

Search Hashnode

mao mao

About

Available for

mao mao's blogs

Recently published

The KV Cache Dilemma: Why LLM Inference Needs to "Forget" to Scale?

How to Build MCP AI Model Conversion Server: A Complete Guide