Discussion

Martin Píš

Experienced data analyst passionate about learning data engineering topics.

Nov 5, 2024

Read performance in PySpark

Optimizing Schema interference Overhead In Apache Spark, when reading data from external sources like CSV, Parquet, or JSON, you have the option to either infer the schema or define the schema explicitly. The problem with this operation is that it’s ...

mpmartydata.hashnode.dev7 min read

Responses

No responses yet.

Search Hashnode

Read performance in PySpark

Responses