Tag feed

#local-inference

4 posts0 followers

Explore Hashnode

Alternatives

Trending tags this week

LWLearn with HJblog.hardeepjethwani.comJul 11 · 6 min read

AI PCs and Neural Chips: What NPUs Actually Do

🚀 AI PCs and Neural Chips: What NPUs Actually Do 👋 Welcome to Day 63 of 90 Days of AI. 🎯 Today we are tackling AI PCs and Neural Chips: What NPUs Actually Do. The mission is simple: understand the

0

IKIvan Klawdblog.thecgaigroup.comJun 6 · 16 min read

A 12B You Can Run on a Laptop: Local Inference Grows Up

A 12B You Can Run on a Laptop: Local Inference Grows Up For two years the local-AI argument lost on the same sentence every time: "It's impressive that it runs on your laptop, but you wouldn't actuall

0

JKJangwook Kimeffloow.hashnode.devMay 9 · 6 min read

Gemma 4 MTP Drafters: How Multi-Token Prediction Delivers 2x+ Faster Local Inference

On May 5, 2026, Google released Multi-Token Prediction (MTP) drafters for the Gemma 4 family. The headline claim — up to 3x inference speedup — is technically accurate on specific hardware. The more realistic number for most developer setups is 1.7x ...

0

VBVlad Butacuomniforge.onlineApr 15 · 8 min read

Your Local LLM Is Slow Because of Five Config Flags

Your model fits in memory. You load it up, send a prompt, and watch it choke halfway through a conversation. Or it runs, but at 3 tokens per second on hardware that should do better. You picked the ri

0

AWAlan Westalan-west.hashnode.devApr 6 · 5 min read

Ollama Just Got 93% Faster on Mac. Here's How to Enable It.

My M4 Max was decoding Qwen3.5 at 58 tokens per second yesterday. Today it's doing 112. Same model, same hardware, same prompt. The only thing that changed was a single environment variable. Ollama 0.19 shipped on March 31, 2026 with a preview of its...

0

UBUp2itnow Bill Wilsonai-agent-economy.hashnode.devMar 22 · 5 min read

The Sovereign Payment Agent: Running AgentPay MCP Locally with Flash-MoE

A 397-billion parameter model just ran on a MacBook. Not a cloud instance. Not an API call. A MacBook Pro M3 Max with 48GB of RAM. danveloper/flash-moe is a pure C/Metal inference engine that runs Qwen3.5-397B-A17B - a 397B Mixture-of-Experts model -...

0

#local-inference

Search Hashnode

#local-inference

Explore Hashnode

Trending tags this week

AI PCs and Neural Chips: What NPUs Actually Do

A 12B You Can Run on a Laptop: Local Inference Grows Up

Gemma 4 MTP Drafters: How Multi-Token Prediction Delivers 2x+ Faster Local Inference

Your Local LLM Is Slow Because of Five Config Flags

Ollama Just Got 93% Faster on Mac. Here's How to Enable It.

The Sovereign Payment Agent: Running AgentPay MCP Locally with Flash-MoE