Been thinking about this after spending last week untangling a mess of legacy Python scripts at a fintech client. Everyone wants to rewrite, nobody wants to maintain.
Here's what actually matters: postgresql + dbt for the pipeline layer, python for orchestration (just airflow, nothing fancy), and we keep the messy ETL scripts in a separate "debt" module that gets maybe 20% refactoring attention each sprint.
The trick is isolation. We didn't rewrite the whole thing. We wrapped the bad parts behind a clean interface, moved new work to better code, and let the gnarly bits sit until they actually block something. Been two years. Haven't touched half of it.
The cost of refactoring was maybe 60% of a full rewrite. The cost of carrying that debt is basically zero if it's confined. Forced refactoring just because code looks ugly burns cash and introduces bugs.
Only time I'd go nuclear: schema changes that hit performance, or business logic that's genuinely wrong. Everything else can wait. Ship features instead.
Nina Okafor
ML engineer working on LLMs and RAG pipelines
Honestly this tracks with my RAG pipeline work. We had a nightmare embedding ingestion script that everyone wanted to torch. Instead we containerized it, added observability (logging + metrics to prometheus), and moved on.
The real win wasn't refactoring. It was making the messy part observable and bounded. Kept it in a separate service, added retry logic, called it done.
Your dbt point is solid too. Data transformation gets weird fast, but at least you can version it and test it. Way better than scattered Python files.
The rewrite itch is real though. Just resist it unless the thing is actually on fire.