Great practical walkthrough! I've been running Ollama on a Mac Mini with 64GB unified memory for months now, and the local inference experience has improved dramatically.
One thing I found running multiple models for different tasks — qwen3:30b handles complex reasoning surprisingly well for its size, while gemma3:27b is solid for summarization. The key was keeping Ollama's OLLAMA_MAX_LOADED_MODELS tuned to avoid swapping between models killing your memory.
Curious about the CORS setup in production — did you end up using a reverse proxy, or does Ollama's built-in OLLAMA_ORIGINS env var cover most use cases? In my setup I had to whitelist specific origins when multiple local apps were hitting the same Ollama instance simultaneously.