I spent 6 hours studying PySpark join strategies. Here's what I learned
match keys between two tables and boom, you get results. That mindset worked fine in SQL databases. Then I started working with Spark on large datasets and my jobs started failing, timing out, or grinding for hours.
The reality: Spark join performanc...
thanh-de.hashnode.dev6 min read