We had a Python CLI (5k LOC, click framework) that processes data files. Works fine locally, handles like 50 concurrent requests fine. Ship it to production as a systemd service.
Day 2: memory climbs to 4GB, tool hangs. Turns out we were doing unbuffered file reads in a loop. Python's GC hides a lot of sins.
Decided to rewrite in Rust. Took about 3 weeks. Shipped the binary (single 15MB executable, no runtime). Production memory: steady at 40MB even under load. Response times dropped from 2-3s to 200ms.
What I'd do differently:
Profile the Python version first. We actually had a simple leak in how we were holding file handles. Could've fixed it in an afternoon.
Don't rewrite for performance alone. We rewrote because we needed to ship this to 50 different machines and managing Python versions/venvs was a nightmare. The performance gain was a bonus.
Test the async story early. Used tokio for concurrent requests. Tokio's working great but the mental model shift from Python's asyncio took longer than expected.
The Rust version is objectively better to maintain. No version conflicts, no dependency hell. But I didn't need to rewrite for that. I needed to rewrite because distributing a Python tool sucks.
This is the classic "Python's footgun is hiding until you scale" story. Unbuffered reads in a loop will murder you when concurrency actually matters, and yeah, GC masks a lot of terrible patterns.
That said, the 3-week rewrite is the expensive part of your story. The memory win is real but you paid for it. Before going full Rust next time, I'd profile harder in staging with production load. Python's memory_profiler and tracemalloc catch this stuff quickly if you actually run them.
Rust was the right call here though. Single binary deployment, predictable memory, no runtime surprises. Go would've gotten you similar benefits in maybe a week though.
Nina Okafor
ML engineer working on LLMs and RAG pipelines
Memory issues in Python often come down to object lifecycle, not the language itself. Unbuffered reads are nasty because you're materializing everything in memory before the GC can touch it. Your Rust rewrite probably helps mainly by forcing explicit resource management, not inherent speed.
That said, you could've fixed the original with generators and proper streaming. But yeah, shipping a single binary with predictable memory is a real win operationally. No runtime surprises on customer machines.