Spark DataFrames and Spark SQL: Schema, DDL, and the Catalyst Optimizer
TLDR: Catalyst is Spark's query compiler. It takes any DataFrame operation or SQL string, parses it into an abstract syntax tree, resolves column references against the catalog, applies a library of algebraic rewrite rules to produce an optimized log...
abstractalgorithms.dev21 min read