Kaustubhworkernode.hashnode.dev·Apr 12, 2024Understanding Parquet FilesApache Parquet is a open source, column-based file format that's great for storing and retrieving data quickly. It has smart compression and encoding methods to handle large amounts of data easily. It's perfect for both regular and interactive tasks....Discussbig data
Vaishnave Subbramanianvaishnave.page·Apr 4, 2024Sparks FlyFile Formats In the realm of data storage and processing, file formats play a pivotal role in defining how information is organized, stored, and accessed. These formats, ranging from simple text files to complex structured formats, serve as the blue...Discuss·361 readsDabbling with Apache Sparkspark
David MarquisProdavidmarquis.hashnode.dev·Mar 23, 2024Fastparquet: A Guide for Python Data EngineersRecently I was on the path to hunt down a way to read and test parquet files to help one of the remote teams out. The Apache Parquet file format, known for its high compression ratio and speedy read/write operations, particularly for complex nested d...Discuss·87 readsfastparquet
Aaron Jevil Nazarethaarons-space.hashnode.dev·Mar 4, 2024Delta Lake(.Parquet) vs JSON Formats for storageIntroduction Fast storage and retrieval of data are vital for maintaining a competitive edge, enhancing user experience, and facilitating efficient decision-making, especially in a fast-paced digital environment where responsiveness and scalability a...Discuss·11 likes·49 readsJavaScript
Aditya Tiwaridataml.hashnode.dev·Jan 27, 2024Parquet batch ingestion with Python, Postgres and Docker.Introduction We'll use Python to ingest into a Postgres database the NYC TLC green taxi trip records for September 2019, which are available in a Parquet file at the following URL:(https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2019-0...Discuss·10 likes·319 readsDE Zoomcamp 2024 BlogathonPython
Maheshwar Ligadefortechwastitechwasti.com·Jan 14, 2024Parquet File Handling in Go: A Complete Guide!Parquet, a columnar storage file format, is efficient for large-scale data processing. Handling Parquet files in Go allows efficient data storage and retrieval. This guide covers the essentials of working with Parquet files in Go, including reading, ...Discuss·404 readsgo-languagegolang
Raghuveer Sriramanraghuveer.me·Jul 29, 2023Parquet format - A deep dive : Part 4After a lot of theory it's finally to talk about the code. Since there is a lot going on in the codebase, this will still be quite high level but it should serve as a good starting point. parquet-mr consists uses maven multi modules approach, the mod...Discuss·56 readsThe parquet format - a deep divedata-engineering
Raghuveer Sriramanraghuveer.me·Jul 23, 2023Parquet format - A deep dive : Part 3Previously, we talked about how to parquet writes data. In this article, we will talk about how parquet reads data. Once again, parquet borrows from Dremel and uses its record assembly algorithm. We also talk briefly on some clever optimizations parq...Discuss·103 readsThe parquet format - a deep divedata-engineering
Nandini Tatanandini-tata.hashnode.dev·Jul 17, 2023Parquet Files vs. CSV: The Battle of Data Storage FormatsHey there, fellow data enthusiasts! Have you ever wondered how massive amounts of data can be stored efficiently, making it easily accessible for analysis? Well, let me introduce you to the superheroes of data storage: Parquet files! Choosing the rig...Discuss·1 like·28 readsdata
Tony Kipkemboithedataengineerblog.com·May 30, 2023The 4 Common Data Formats in Data EngineeringIntroduction Choosing the right data format is an integral part of data engineering. The decision significantly influences data storage, processing speed, and interoperability. This article dissects four popular data formats: CSV, JSON, Parquet, and ...DiscussData Science