cool idea. running everything local with Ollama is the right call if you want zero ongoing costs.
few questions though:
- which models are you running? because the quality gap between something like llama 3 8B and a hosted model is still pretty big for content generation. local is free but if the output needs heavy editing every time, the time cost adds up.
- what's the hardware floor? not everyone has a GPU that can run decent sized models at a usable speed.
- how are you handling context? content generation usually needs longer context windows and that's where smaller local models start to struggle.
the "no API cost" angle is appealing but i'd be curious to see some actual output samples compared to something like Claude or GPT-4. free doesn't matter much if the content isn't usable without rewriting half of it.
is this Windows only or are you planning cross-platform? would definitely try it out if there's a Linux build.