Apr 6 · 5 min read · My M4 Max was decoding Qwen3.5 at 58 tokens per second yesterday. Today it's doing 112. Same model, same hardware, same prompt. The only thing that changed was a single environment variable. Ollama 0.19 shipped on March 31, 2026 with a preview of its...
Join discussion
Mar 18 · 3 min read · Apple Silicon changed what is possible for local AI. The unified memory architecture means ML models can run on the GPU without copying data between CPU and GPU memory. For a desktop agent that needs to process screen content in real-time, this matte...
Join discussionMar 7 · 26 min read · Every tutorial on fine-tuning LLMs starts the same way: "Spin up an A100 on AWS..." Not this one. In this guide, you'll fine-tune Meta's Llama 3.2 1B into a Text-to-SQL assistant — entirely on your Ma
Join discussionFeb 9 · 5 min read · If you’ve ever used a Mac with Apple silicon, you’ve likely felt its speed and endurance. Apps launch instantly, video renders smoothly, and your battery seems to last forever—even with dozens of browser tabs open. While it's easy to credit this to “...
Join discussionFeb 6 · 4 min read · Why I Wanted to Do This I wanted to see if I could: Run a large language model locally Use it outside my Mac, especially from my iPhone Keep everything private and under my control Avoid exposing any service to the public internet Still keep the...
Join discussion
Jan 17 · 8 min read · The Problem: ZK Proofs Are Slow Zero-knowledge proofs are transforming blockchain technology, enabling private transactions, scalable rollups, and trustless computation. But there's a catch: generating ZK proofs is computationally expensive. A typica...
Join discussionJan 16 · 7 min read · What if I told you that you could run an election where: Nobody can see how anyone voted—not the server, not the administrators, not even a hacker who compromises the entire system The results are mathematically provable to be correct It runs on a...
Join discussionDec 31, 2025 · 4 min read · I work with Metal on Apple Silicon in my day job, and the experience of Unified Memory has been revealing. No explicit transfers between CPU and GPU. No SetData / GetData choreography. Just... shared memory. This made me think: what if this became th...
Join discussion