강문규 (@devsnack)

강강문규devsnack.hashnode.devMay 6 · 10 min read

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

TL;DR — I was happily running Qwen3.6 on llama.cpp. Then I saw claims of 2× speed with vLLM + NVFP4 + DFlash. So I installed it, fought through crashes, and measured it myself. Verdict: it's real. 88–

0

강강문규devsnack.hashnode.devMay 6 · 8 min read

Gemma 4 MTP Drafter on DGX Spark: 2.89x Speedup for Dense 31B — No Quality Loss

An 870 MB drafter model turned Dense 31B from 6.5 → 18.8 tok/s. No model swap, no training, no quality degradation. If you have a DGX Spark, there's no reason not to use this. Key Results Model Fra

0

강강문규devsnack.hashnode.devMay 6 · 8 min read

Every Optimized Model for NVIDIA DGX Spark GB10 — Benchmarked & Ranked (April 2026)

Got your hands on an NVIDIA DGX Spark but have no idea which models to run on it? I scoured every GB10-optimized model on Hugging Face so you don't have to. Table of Contents What Makes DGX Spark Spe

1

L

강문규

About

Available for

강문규's blogs

Recently published

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

Gemma 4 MTP Drafter on DGX Spark: 2.89x Speedup for Dense 31B — No Quality Loss

Every Optimized Model for NVIDIA DGX Spark GB10 — Benchmarked & Ranked (April 2026)

강문규

About

Available for

강문규's blogs

Recently published

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

Gemma 4 MTP Drafter on DGX Spark: 2.89x Speedup for Dense 31B — No Quality Loss

Every Optimized Model for NVIDIA DGX Spark GB10 — Benchmarked & Ranked (April 2026)

강문규

About

Available for

강문규's blogs

Recently published

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

Gemma 4 MTP Drafter on DGX Spark: 2.89x Speedup for Dense 31B — No Quality Loss

Every Optimized Model for NVIDIA DGX Spark GB10 — Benchmarked & Ranked (April 2026)

Search Hashnode

강문규

About

Available for

강문규's blogs

Recently published

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

Gemma 4 MTP Drafter on DGX Spark: 2.89x Speedup for Dense 31B — No Quality Loss

Every Optimized Model for NVIDIA DGX Spark GB10 — Benchmarked & Ranked (April 2026)