The RAM bottleneck point hit home. I helped a small team set up a local Ollama instance thinking the RTX 3080 would carry it — ended up being 16GB of DDR4 choking everything on longer context windows. Swapped to 64GB DDR5 and the difference was night and day. The "diagnose the choke point first" framing is exactly the right mental model. Too many people treat hardware upgrades like a lottery ticket.