Why GPT-OSS‑120B Feels Slow on a MacBook Pro M4 Max (128GB)
1) Model size vs. unified memory headroom
GPT‑OSS‑120B is enormous. Even with modern low‑precision formats, a 120B model routinely occupies tens of gigabytes in VRAM/RAM, and practical deployments keep additional copies/buffers for KV cache, activati...
developer.tenten.co4 min read