Shreyan Dasshreyandas.hashnode.dev·Jul 19, 2024Need for speed, but where do I store all of this data?Dear Diary, The most dreaded day in the life of any data engineer came to me a while ago — our TL sat us down to tell us that we had been billed an enormous amount for our cloud storage in the last quarter, and we needed to find a way to cut down sto...Kev and 1 other are discussing this2 people are discussing thisDiscuss·40 likes·27 readsConfessions of a Data Engineerdata-engineering
Kaustubhworkernode.hashnode.dev·Apr 12, 2024Understanding Parquet FilesApache Parquet is a open source, column-based file format that's great for storing and retrieving data quickly. It has smart compression and encoding methods to handle large amounts of data easily. It's perfect for both regular and interactive tasks....Discussbig data
Vaishnave Subbramanianvaishnave.page·Apr 4, 2024Sparks FlyFile Formats In the realm of data storage and processing, file formats play a pivotal role in defining how information is organized, stored, and accessed. These formats, ranging from simple text files to complex structured formats, serve as the blue...Discuss·1 like·555 readsDabbling with Apache Sparkspark
David Marquisdavidmarquis.hashnode.dev·Mar 23, 2024Fastparquet: A Guide for Python Data EngineersRecently I was on the path to hunt down a way to read and test parquet files to help one of the remote teams out. The Apache Parquet file format, known for its high compression ratio and speedy read/write operations, particularly for complex nested d...Discuss·623 readsfastparquet
Aaron Jevil Nazarethaarons-space.hashnode.dev·Mar 4, 2024Delta Lake(.Parquet) vs JSON Formats for storageIntroduction Fast storage and retrieval of data are vital for maintaining a competitive edge, enhancing user experience, and facilitating efficient decision-making, especially in a fast-paced digital environment where responsiveness and scalability a...Discuss·11 likes·40 readsJavaScript
Aditya Tiwaridataml.hashnode.dev·Jan 27, 2024Parquet batch ingestion with Python, Postgres and Docker.Introduction We'll use Python to ingest into a Postgres database the NYC TLC green taxi trip records for September 2019, which are available in a Parquet file at the following URL:(https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2019-0...Discuss·10 likes·437 readsDE Zoomcamp 2024 BlogathonPython
Maheshwar Ligadefortechwastitechwasti.com·Jan 14, 2024Parquet File Handling in Go: A Complete Guide!Parquet, a columnar storage file format, is efficient for large-scale data processing. Handling Parquet files in Go allows efficient data storage and retrieval. This guide covers the essentials of working with Parquet files in Go, including reading, ...Discuss·1.1K readsgo-languagegolang
Raghuveer Sriramanraghuveer.me·Jul 29, 2023Parquet format - A deep dive : Part 4After a lot of theory it's finally to talk about the code. Since there is a lot going on in the codebase, this will still be quite high level but it should serve as a good starting point. parquet-mr consists uses maven multi modules approach, the mod...Discuss·143 readsThe parquet format - a deep divedata-engineering
Raghuveer Sriramanraghuveer.me·Jul 23, 2023Parquet format - A deep dive : Part 3Previously, we talked about how to parquet writes data. In this article, we will talk about how parquet reads data. Once again, parquet borrows from Dremel and uses its record assembly algorithm. We also talk briefly on some clever optimizations parq...Discuss·307 readsThe parquet format - a deep divedata-engineering
Nandini Tatanandini-tata.hashnode.dev·Jul 17, 2023Parquet Files vs. CSV: The Battle of Data Storage FormatsHey there, fellow data enthusiasts! Have you ever wondered how massive amounts of data can be stored efficiently, making it easily accessible for analysis? Well, let me introduce you to the superheroes of data storage: Parquet files! Choosing the rig...Discuss·1 like·47 readsdata