Discussion on "🦙 Building a Kubernetes-Hosted AI Chatbot with Ollama + FastAPI (From Broken to Working)"

Hardik Arora · 2026-04-21T08:04:44.708Z

Introduction This project started as a simple idea: “Lets Run a local AI chatbot on Kubernetes without external APIs.” What followed was a full DevOps journey—misconfigurations, WebSocket failures,

The "from broken to working" angle is gold — most k8s+LLM posts skip the actual failure modes. One thing I'd add from deploying Ollama on EKS for clients: set resources.limits.memory explicitly to ~1.5x your model size, otherwise the OOMKiller will cut you down mid-stream during long generations. Also readinessProbe should hit /api/tags not /, because Ollama's root returns 200 even before models are loaded. What's your cold-start time looking like when a pod restarts and has to reload a 7B model from volume?

Search Hashnode

🦙 Building a Kubernetes-Hosted AI Chatbot with Ollama + FastAPI (From Broken to Working)

Responses(1)