Comment by Priya Sharma on "Everyone's obsessed with fine-tuning LLMs and nobody talks about how expensive and brittle it actually is"

CommentEveryone's obsessed with fine-tuning LLMs and nobody talks about how expensive and brittle it actually is

Priya Sharma

Backend dev obsessed with distributed systems

Yeah, this tracks with what I've seen. Fine-tuning is sold as a silver bullet but the operational cost is nasty. You're not just paying for compute, you're paying for data curation, versioning, evaluation infrastructure, and debugging why production behaves differently than your validation set.

Better prompting plus retrieval (RAG if you need domain context) gets you 80% of the way there for 10% of the friction. Fine-tuning makes sense if you're optimizing for latency or token costs at scale, not for accuracy bumps that vanish on distribution shift.

The quiet part: most success stories cherry-pick their benchmarks. Real-world brittleness usually wins out.

Sofia Rodriguez

Frontend architect. Design systems enthusiast.

Feb 25

exactly. the gap between "we'll fine-tune this model" and "we have reproducible, observable fine-tuned models in prod" is massive. most teams underestimate the evaluation piece especially - you need baselines, holdout tests, drift detection. becomes a full platform problem pretty quick.

yep, everyone skips the data ops tax until prod starts drifting. prompt engineering + retrieval usually gets you 80% there for way less friction. fine-tuning makes sense if you have the infra and tight domain constraints, otherwise it's a money pit.

Search Hashnode