#parquet articles

AAAbstract Algorithmsabstractalgorithms.hashnode.devApr 19 · 34 min read

Reading and Writing Data in Spark: Parquet, Delta, JSON, and JDBC

TLDR: Parquet's columnar layout with row-group statistics enables predicate pushdown that can reduce a 500 GB scan to 8 GB. Delta Lake wraps Parquet with a JSON transaction log to add ACID semantics a

0

FFFirman Fakhridatasquad.hashnode.devFeb 3 · 17 min read

Struggling with Large CSV Files? Here's Why DuckDB Changed My Workflow

So last Monday, my PM mentioned me at Whatsapp group around 9 AM and ask for some analysis sounds familiar, right? Then He drops around 120MB excel file. one hundred twenty megabytes. I should've known better. I tried opening it in Excel. That spinni...

0

GDGhanshyam Digitalblog.ghanshyamdigital.comNov 13, 2025 · 5 min read

How I Saved $800+ Daily Using DuckDB & Apache Superset Instead of AWS Redshift for Analytics

The Problem: Skyrocketing AWS Analytics Costs When managing analytics for our loan management system, we initially turned to the standard AWS stack: Amazon Redshift for data warehousing and AWS Glue for ETL pipelines. The result? A shocking $800 bill...

0

JPJagdish Pariharblog.jatin510.devJul 24, 2025 · 2 min read

Understanding Parquet: An Efficient Columnar File Format

Introduction Parquet has quickly become one of the most popular file formats for storing large-scale analytics data. Parquet is now a top choice due to its efficiency, compression, and seamless integration with big data frameworks. My experience cont...

0

F“Francesco “oha” Rivettiblog.oha.itJun 24, 2025 · 2 min read

b-Square: Efficient Geospatial Indexing for Tabular Data

At HUB Ocean, we work with massive volumes of geospatial marine data, with some datasets containing billions of rows. To ensure fast queries, we developed b-Square — a simple but effective mechanism to index geometries in tabular formats like Parquet...

0

ZLZimo Liblog.zimo.liMay 28, 2025 · 2 min read

I wrote a Parquet viewer in Rust to avoid running SQL for the PM

Every data team knows the drill: a PM needs to “just take a quick look” at some Parquet data. That usually means asking an engineer to write SQL or spin up a tool to pull a few rows. It’s a small ask, but one that happens often enough to slow everyon...

0

IEIslam Elbannapractical-software.comMay 24, 2025 · 4 min read

Selecting the Best File Formats for Apache Spark: Parquet, ORC, CSV and more

One of the most important decisions in your Apache Spark pipeline is how you store your data. The data format you choose can dramatically affect performance, storage costs, and query speed. Let’s explore the most common file formats supported by Apac...

1

N

MKMehul Kansalmehulkansal.hashnode.devMay 11, 2025 · 6 min read

22: AWS Athena Setup and Optimization 📊

Hi Data Folks! 👋 In this blog, we’ll walk through setting up AWS Athena for querying data in S3, defining table structures using both manual metadata and Glue Crawlers, and optimizing query performance with techniques like partitioning and columnar ...

0

RMRaju Mandalblog.rajumandal.com.npApr 22, 2025 · 9 min read

How Apache Parquet Stores Data: Internals, Compression, and Performance Explained

Data is growing, and fast. Whether you're querying petabytes in a data lake or running analytics in a cloud warehouse, the format you store your data in can make or break performance. Parquet is your saviour. If you've ever used tools like Apache S...

0

HCHarshita Chaudharyharshita.hashnode.devFeb 24, 2025 · 2 min read

Create ADLS External Tables in Azure Synapse

Create a Master Key if not present. Why? If a database scoped credential is used, Synapse requires encryption. The Master Key ensures that credentials are securely stored. ✅ Needed only once per database. ✅ Required for setting up authe...

0

#parquet

#parquet

Explore Hashnode

Trending tags this week

Reading and Writing Data in Spark: Parquet, Delta, JSON, and JDBC

Struggling with Large CSV Files? Here's Why DuckDB Changed My Workflow

How I Saved $800+ Daily Using DuckDB & Apache Superset Instead of AWS Redshift for Analytics

Understanding Parquet: An Efficient Columnar File Format

b-Square: Efficient Geospatial Indexing for Tabular Data

I wrote a Parquet viewer in Rust to avoid running SQL for the PM

Selecting the Best File Formats for Apache Spark: Parquet, ORC, CSV and more

22: AWS Athena Setup and Optimization 📊

How Apache Parquet Stores Data: Internals, Compression, and Performance Explained

Create ADLS External Tables in Azure Synapse

#parquet

Search Hashnode

#parquet

Explore Hashnode

Trending tags this week

Reading and Writing Data in Spark: Parquet, Delta, JSON, and JDBC

Struggling with Large CSV Files? Here's Why DuckDB Changed My Workflow

How I Saved $800+ Daily Using DuckDB & Apache Superset Instead of AWS Redshift for Analytics

Understanding Parquet: An Efficient Columnar File Format

b-Square: Efficient Geospatial Indexing for Tabular Data

I wrote a Parquet viewer in Rust to avoid running SQL for the PM

Selecting the Best File Formats for Apache Spark: Parquet, ORC, CSV and more

22: AWS Athena Setup and Optimization 📊

How Apache Parquet Stores Data: Internals, Compression, and Performance Explained

Create ADLS External Tables in Azure Synapse