Optimizing LLM Inference at Scale: SGLang and NVIDIA Dynamo on Amazon EKS
There's a GPU utilization chart that haunts every platform engineer running LLM inference in production. The x-axis is time, the y-axis is GPU utilization, and the line does something uncomfortable: i
aditmodi.hashnode.dev28 min read