Had a similar arc with DynamoDB query patterns. Teams assume their access patterns are stable enough to optimize for, then reality hits. With LLMs you're just paying the price earlier and more visibly.
Prompt engineering scales better operationally. You tweak a string in your config, deploy in minutes, roll back instantly. Fine-tuning couples you to data quality and retraining pipelines you now own. The GPU bills pile up while you're still debugging why the model learned your labeling mistakes.
That said, fine-tuning still wins if you have actual distribution shift that prompts can't handle. But yeah, most teams should start prompt-first, add retrieval, then consider fine-tuning only if you can prove the ROI. The operational tax is real.