Apr 23 · 8 min read · I Run a 40GB AI Model on a MacBook. Three Months of MLX on M1 Max Has Changed How I Think About Apple Silicon. It's Just a Laptop. But It's Running a 40GB Model Right Now. I'm drafting this on a MacBook Pro. Qwen 3.6 35B-A3B MoE Q8 — about 40GB of we...
Join discussionApr 23 · 5 min read · ▶ Watch the race on YouTube: https://www.youtube.com/watch?v=2KeTDDodE0A April 22, 2026. Anthropic's Claude Code Max plan jumped to $100 a month. I ran a live three-way AI race on the exact same prompt — Gemma 31B local, Llama 70B local, and Claude...
Join discussionApr 6 · 5 min read · My M4 Max was decoding Qwen3.5 at 58 tokens per second yesterday. Today it's doing 112. Same model, same hardware, same prompt. The only thing that changed was a single environment variable. Ollama 0.19 shipped on March 31, 2026 with a preview of its...
Join discussion
Mar 18 · 3 min read · Apple Silicon changed what is possible for local AI. The unified memory architecture means ML models can run on the GPU without copying data between CPU and GPU memory. For a desktop agent that needs to process screen content in real-time, this matte...
Join discussionMar 14 · 8 min read · It started with a tweet. Google Devs posted a demo of FunctionGemma running a game, and I watched this tiny model parse natural language into structured function calls in real time. My immediate thoug
Join discussion
Oct 30, 2025 · 4 min read · ML Pipeline Tutorial: Fine-tune Models Link to GitHub project you gonna need to follow this: project Learn to build a complete ML pipeline for fine-tuning models using F1 racing data. This tutorial covers data processing, workflow management, and m...
Join discussion