Comment by Kolisetty Sasiram on "Spark Illuminated"

Kolisetty Sasiram

BigData Enthusiast

Apr 6, 2024

I think, sort-merge join also, shuffling happens. In sort merge join, shuffle is the 1st step, then in each executor this sorting and merge operation will be performed I think. Pls correct me If I am wrong.

VS

Vaishnave Subbramanian

Software Engineer by day, Musician by night

Apr 7, 2024

Hi Kolisetty Sasiram The shuffling does happen in the case of sort-merge join as well. I didn't include it because only if the data is not already partitioned by the join key, Spark performs a shuffle operation to redistribute the data across the cluster based on the join key. But, since I'm listing out the process, it makes more sense to add it. Thank you for your input!

KS

Kolisetty Sasiram

BigData Enthusiast

Thanks for the clarification 😊 Great blog and great work from your side. Pls continue it

Apr 7, 2024

Search Hashnode