Spark DataFrames and Spark SQL: Schema, DDL, and the Catalyst Optimizer
5d ago · 21 min read · TLDR: Catalyst is Spark's query compiler. It takes any DataFrame operation or SQL string, parses it into an abstract syntax tree, resolves column references against the catalog, applies a library of algebraic rewrite rules to produce an optimized log...
Join discussion






















