May 6 · 10 min read · TL;DR — I was happily running Qwen3.6 on llama.cpp. Then I saw claims of 2× speed with vLLM + NVFP4 + DFlash. So I installed it, fought through crashes, and measured it myself. Verdict: it's real. 88–
Join discussionApr 29 · 26 min read · I have an RTX 3090 sitting in a Xeon Silver 4314 box at home. I wanted to: Stand up a local inference stack (vLLM nightly with all the bells: speculative decoding, FlashInfer, prefix caching). Use t
Join discussion
Apr 13 · 11 min read · Welcome to Module B6 — The Next Layer. Four posts that sit just past the edge of the mainstream stack. Local models, fine-tuning honestly, multimodal in practice, and the frontier worth following. The module is less tactical than the others — fewer "...
Join discussion