This hit close to home. I process financial data from 14 different Korean bank formats on a Mac Mini, and the memory spikes from pandas were brutal — 8GB+ for what should be a simple CSV parse.
Switching to lazy evaluation (in my case, streaming row-by-row with custom parsers) was the turning point. The mental shift from "load everything, then filter" to "define the pipeline, execute once" is so underrated.
Curious about your experience with Polars + partitioned Parquet — did you notice any gotchas with schema evolution when new data sources get added? That's been my pain point with heterogeneous financial data.