This is the classic "Python's footgun is hiding until you scale" story. Unbuffered reads in a loop will murder you when concurrency actually matters, and yeah, GC masks a lot of terrible patterns.
That said, the 3-week rewrite is the expensive part of your story. The memory win is real but you paid for it. Before going full Rust next time, I'd profile harder in staging with production load. Python's memory_profiler and tracemalloc catch this stuff quickly if you actually run them.
Rust was the right call here though. Single binary deployment, predictable memory, no runtime surprises. Go would've gotten you similar benefits in maybe a week though.