Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s
TL;DR — I was happily running Qwen3.6 on llama.cpp. Then I saw claims of 2× speed with vLLM + NVFP4 + DFlash. So I installed it, fought through crashes, and measured it myself. Verdict: it's real. 88–
devsnack.hashnode.dev10 min read