© 2026 Hashnode
My M4 Max was decoding Qwen3.5 at 58 tokens per second yesterday. Today it's doing 112. Same model, same hardware, same prompt. The only thing that changed was a single environment variable. Ollama 0.19 shipped on March 31, 2026 with a preview of its...

Why I Wanted to Do This I wanted to see if I could: Run a large language model locally Use it outside my Mac, especially from my iPhone Keep everything private and under my control Avoid exposing any service to the public internet Still keep the...
