Tag feed

#llamacpp

22 posts3 followers

Trending tags this week

PYPhaneesh Yalavarthiphaneesh.hashnode.dev

Local AI: Running Gemma 4 with llama.cpp and Docker

5d ago · 1 min read · Introduction In the rapidly evolving landscape of Artificial Intelligence, the ability to run Large Language Models (LLMs) locally has become a game-changer for developers and researchers alike. Wheth

Join discussion

강강문규devsnack.hashnode.dev

0

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

May 6 · 10 min read · TL;DR — I was happily running Qwen3.6 on llama.cpp. Then I saw claims of 2× speed with vLLM + NVFP4 + DFlash. So I installed it, fought through crashes, and measured it myself. Verdict: it's real. 88–

Join discussion

PTPraneeth TSSptss.hashnode.dev

0

Why I built bonfire — a local-first terminal coding assistant

May 4 · 8 min read · I didn't start with a grand vision about privacy or open source. I just wanted something that ran on my machine. And didn't require me to upload my code to do it. On top of it, I was tired of my token

Join discussion

VBVlad Butacuomniforge.online

0

Your Local LLM Is Slow Because of Five Config Flags

Apr 15 · 8 min read · Your model fits in memory. You load it up, send a prompt, and watch it choke halfway through a conversation. Or it runs, but at 3 tokens per second on hardware that should do better. You picked the ri

Join discussion

NKNishchay Kaushikblog.nkaushik.in

0

How to Run LLM Models on Old Android Devices Locally

Feb 13 · 5 min read · This post covers a much more technical and involving way to run local LLMs on android via a terminal setup. If you want to try an easy-to-use , less technical way, I have it covered in this latest bl

Join discussion

IAIbrar Ansariibraransari.hashnode.dev

0

🚀 7 Ways to Run Any LLMs Locally - Simple Methods

Jan 26 · 7 min read · A comprehensive guide to running Large Language Models (LLMs) locally on your machine using various tools and platforms. 🎬 Video Demonstration 1. 🦙 Ollama - The Dominant Local LLM Ecosystem Ollama is the dominant ecosystem for running LLMs such a...

Join discussion

AAmegillasparktastic.hashnode.dev

0

Updating GGUF Registry for new Llama Server UI

Jan 4 · 2 min read · No sooner had I build a GGUF model registry than llama.cpp released functionality to dynamically load and unload models from their new llama-server web UI! I had a play with this and realised that it doesn’t exactly work for my setup, mainly because ...

Join discussion

AAmegillasparktastic.hashnode.dev

0

Practical local LLM examples on DGX Spark

Jan 3 · 6 min read · I’ve had my Spark for a couple of months now. Since it is my only personal computer I’ve gone through the process of working out how to use local LLMs for general tasks like taking & editing notes, browsing the web and coding. This is what I’ve learn...

Join discussion

AAmegillasparktastic.hashnode.dev

0

Creating a GGUF Registry for local models on my DGX Spark

Dec 27, 2025 · 5 min read · Prior to acquiring a DGX Spark, my experience running local LLMs was limited to basic experimentation with Ollama. So when I got my new toy I went a bit crazy downloading lots of different models. And then ended up with a page of notes containing a l...

Join discussion

AAmegillasparktastic.hashnode.dev

0

Choosing an Inference Engine on DGX Spark

Nov 30, 2025 · 7 min read · TL;DR The DGX Spark has enough unified RAM to load large LLMs, but using dense models makes everything slow. Before I realised the real bottleneck (MoE vs dense, covered in Part 2), I went deep into inference engines. Here’s how they compare on DGX S...

Join discussion

#llamacpp

Search Hashnode

#llamacpp

Trending tags this week

Local AI: Running Gemma 4 with llama.cpp and Docker

Qwen3.6 on DGX Spark: vLLM + NVFP4 + DFlash vs llama.cpp — 2x Faster at 88–104 tok/s

Why I built bonfire — a local-first terminal coding assistant

Your Local LLM Is Slow Because of Five Config Flags

How to Run LLM Models on Old Android Devices Locally

🚀 7 Ways to Run Any LLMs Locally - Simple Methods

Updating GGUF Registry for new Llama Server UI

Practical local LLM examples on DGX Spark

Creating a GGUF Registry for local models on my DGX Spark

Choosing an Inference Engine on DGX Spark