AKAdam Kinginflexai.hashnode.dev·Apr 30 · 6 min readRunning AI Workloads Across NVIDIA, AMD, and Multiple Clouds Without RefactoringThe default path for most AI teams today is single-vendor, single-cloud: pick NVIDIA, pick AWS (or GCP, or Azure), and build everything around that stack. It works until it doesn't — hyperscaler credi00
AKAdam Kinginflexai.hashnode.dev·Apr 21 · 6 min readFine-Tuning a 32B Legal LLM That Outperformed a Frontier Model at 4× Lower Serving CostEveryone assumes bigger models produce better results. LegML set out to prove otherwise. They fine-tuned a 32B-parameter legal LLM — internally called "Hugo" — that outperformed a leading frontier mod00
AKAdam Kinginflexai.hashnode.dev·Apr 14 · 5 min readLLM Inference GPU Sizing: How to Choose the Right GPU for Your Model and TrafficWhen developers scale LLM workloads to production, one question always comes up: which GPUs should I use, how many will I need, and how much is this going to cost me? Not a back-of-the-envelope guess 10
AKAdam Kinginflexai.hashnode.dev·Apr 7 · 7 min readWhy AI Infrastructure Software Is Harder Than Hardware — Lessons from Building Aurora and FlexAISmall AI startups are dying — not from lack of innovation, but from infrastructure exhaustion. While the industry focuses on model architecture and training data, a quieter crisis unfolds in the trenc00
AKAdam Kinginflexai.hashnode.dev·Apr 3 · 8 min readHeterogeneous AI Compute: Why Mixed NVIDIA, AMD, and TPU Clusters Are Harder Than They LookAs AI workloads expand across cloud, edge, and enterprise environments, the infrastructure underpinning them is shifting toward hardware diversity. The question is no longer whether teams will run on 00