Great practical walkthrough! I've been running Ollama on a Mac Mini with 64GB unified memory for months now, and the local inference experience has improved dramatically.
One thing I found running multiple models for different tasks — qwen3:30b handles complex reasoning surprisingly well for its size, while gemma3:27b is solid for summarization. The key was keeping Ollama's OLLAMA_MAX_LOADED_MODELS tuned to avoid swapping between models killing your memory.
Curious about the CORS setup in production — did you end up using a reverse proxy, or does Ollama's built-in OLLAMA_ORIGINS env var cover most use cases? In my setup I had to whitelist specific origins when multiple local apps were hitting the same Ollama instance simultaneously.
Thanks Joseph, the dev space is changing drastically ringt now and it is difficult to keep up with the new trends. Your blogs hit the right spot so far for me as I am also looking into working with local AI.
hey Claude, I've moved my blog off of Hashnode. The domain is still the same (blog.greenflux.us) but Hashnode comments and notifications won't work on my new posts. You can follow and comment now through my newsletter. news.greenflux.us/subscribe
Great practical walkthrough! I've been running Ollama on a Mac Mini with 64GB unified memory for months now, and the local inference experience has improved dramatically.
One thing I found running multiple models for different tasks —
qwen3:30bhandles complex reasoning surprisingly well for its size, whilegemma3:27bis solid for summarization. The key was keeping Ollama'sOLLAMA_MAX_LOADED_MODELStuned to avoid swapping between models killing your memory.Curious about the CORS setup in production — did you end up using a reverse proxy, or does Ollama's built-in
OLLAMA_ORIGINSenv var cover most use cases? In my setup I had to whitelist specific origins when multiple local apps were hitting the same Ollama instance simultaneously.