Apr 6 · 5 min read · My M4 Max was decoding Qwen3.5 at 58 tokens per second yesterday. Today it's doing 112. Same model, same hardware, same prompt. The only thing that changed was a single environment variable. Ollama 0.19 shipped on March 31, 2026 with a preview of its...
Join discussion
Mar 22 · 5 min read · A 397-billion parameter model just ran on a MacBook. Not a cloud instance. Not an API call. A MacBook Pro M3 Max with 48GB of RAM. danveloper/flash-moe is a pure C/Metal inference engine that runs Qwen3.5-397B-A17B - a 397B Mixture-of-Experts model -...
Join discussion