Discussion on "Why local LLM inference stalls on Apple Silicon (and how to fix it)"

Alan West · 2026-05-10T16:31:34.271Z

I spent a chunk of last month trying to run a 30B-class model locally on my M2 Max. 64GB of unified memory, a stack of GPU cores, no other apps running. Should be smooth. Instead I got around 3 tokens per second, a fan that sounded like a leaf blower...

Discussion

Why local LLM inference stalls on Apple Silicon (and how to fix it)

Responses

Recent in Forum

Search Hashnode

Why local LLM inference stalls on Apple Silicon (and how to fix it)

Responses

Recent in Forum