Reading and Writing Data in Spark: Parquet, Delta, JSON, and JDBC
5d ago · 31 min read · TLDR: Parquet's columnar layout with row-group statistics enables predicate pushdown that can reduce a 500 GB scan to 8 GB. Delta Lake wraps Parquet with a JSON transaction log to add ACID semantics and time travel. JSON and CSV read every byte becau...
Join discussion





























