Memory-Efficient PyArrow Datasets
After spending months figuring out how to handle datasets in a memory-efficient way, I decided to collect everything in one place. Hopefully, this will save someone else time.
What Is a Dataset?
If you work with Parquet files, you know each file has...
blog.oha.it4 min read