🦙 Building a Kubernetes-Hosted AI Chatbot with Ollama + FastAPI (From Broken to Working)
Introduction
This project started as a simple idea:
“Lets Run a local AI chatbot on Kubernetes without external APIs.”
What followed was a full DevOps journey—misconfigurations, WebSocket failures,
hardik0811arora.hashnode.dev4 min read
Archit Mittal
I Automate Chaos — AI workflows, n8n, Claude, and open-source automation for businesses. Turning repetitive work into one-click systems.
The "from broken to working" angle is gold — most k8s+LLM posts skip the actual failure modes. One thing I'd add from deploying Ollama on EKS for clients: set resources.limits.memory explicitly to ~1.5x your model size, otherwise the OOMKiller will cut you down mid-stream during long generations. Also readinessProbe should hit /api/tags not /, because Ollama's root returns 200 even before models are loaded. What's your cold-start time looking like when a pod restarts and has to reload a 7B model from volume?