강강문규indevsnack.hashnode.dev·May 6 · 10 min readQwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/sTL;DR — I was happily running Qwen3.6 on llama.cpp. Then I saw claims of 2× speed with vLLM + NVFP4 + DFlash. So I installed it, fought through crashes, and measured it myself. Verdict: it's real. 88–00
강강문규indevsnack.hashnode.dev·May 6 · 8 min readGemma 4 MTP Drafter on DGX Spark: 2.89x Speedup for Dense 31B — No Quality LossAn 870 MB drafter model turned Dense 31B from 6.5 → 18.8 tok/s. No model swap, no training, no quality degradation. If you have a DGX Spark, there's no reason not to use this. Key Results Model Fra00
강강문규indevsnack.hashnode.dev·May 6 · 8 min readEvery Optimized Model for NVIDIA DGX Spark GB10 — Benchmarked & Ranked (April 2026)Got your hands on an NVIDIA DGX Spark but have no idea which models to run on it? I scoured every GB10-optimized model on Hugging Face so you don't have to. Table of Contents What Makes DGX Spark Spe01L