One thing I’d add is that most teams underestimate how hard observability becomes once they move past the “single GPU demo” phase. Logging token counts, caching hits, and model-version drift saves way more money and headaches than people expect. Scaling LLMs isn’t just about bigger GPUs, it’s about tighter feedback loops.