© 2026 Hashnode
I spent a chunk of last month trying to run a 30B-class model locally on my M2 Max. 64GB of unified memory, a stack of GPU cores, no other apps running. Should be smooth. Instead I got around 3 tokens per second, a fan that sounded like a leaf blower...
