May 6 · 10 min read · TL;DR — I was happily running Qwen3.6 on llama.cpp. Then I saw claims of 2× speed with vLLM + NVFP4 + DFlash. So I installed it, fought through crashes, and measured it myself. Verdict: it's real. 88–
Join discussionJan 26 · 7 min read · A comprehensive guide to running Large Language Models (LLMs) locally on your machine using various tools and platforms. 🎬 Video Demonstration 1. 🦙 Ollama - The Dominant Local LLM Ecosystem Ollama is the dominant ecosystem for running LLMs such a...
Join discussion
Jan 4 · 2 min read · No sooner had I build a GGUF model registry than llama.cpp released functionality to dynamically load and unload models from their new llama-server web UI! I had a play with this and realised that it doesn’t exactly work for my setup, mainly because ...
Join discussion
Jan 3 · 6 min read · I’ve had my Spark for a couple of months now. Since it is my only personal computer I’ve gone through the process of working out how to use local LLMs for general tasks like taking & editing notes, browsing the web and coding. This is what I’ve learn...
Join discussion
Dec 27, 2025 · 5 min read · Prior to acquiring a DGX Spark, my experience running local LLMs was limited to basic experimentation with Ollama. So when I got my new toy I went a bit crazy downloading lots of different models. And then ended up with a page of notes containing a l...
Join discussion
Nov 30, 2025 · 7 min read · TL;DR The DGX Spark has enough unified RAM to load large LLMs, but using dense models makes everything slow. Before I realised the real bottleneck (MoE vs dense, covered in Part 2), I went deep into inference engines. Here’s how they compare on DGX S...
Join discussion