OKOrhun Küpeliinorhunkupeli.hashnode.dev·2d ago · 5 min readThe Numbers: Benchmarking My LLM Gateway on a H100A couple of weeks ago I wrote about rewriting my LLM gateway to bring it from MVP to production. The architectural claims were; multi-tenancy, hybrid inference , sub-5ms overhead. So I benchmarked it 00
OKOrhun Küpeliinorhunkupeli.hashnode.dev·May 14 · 5 min readMVP to Mission-Critical: The Idea Behind My LLM Gateway RewriteSeveral months ago I decided to play around with my first LLM gateway prototype which I simply used LiteLLM with some benefits on top. Then I did the math to find out how far it was from the productio00