Sharath Kumar Thungathurthisharaththungathurthi.hashnode.dev·Oct 19, 2024How to Perform Efficient Data Transformations Using PySparkHere are some common interview questions and answers related to transformations in Spark: 1. What are narrow and wide transformations in Spark? Answer: Narrow transformations are transformations where each partition of the parent RDD is used to produ...36 readspyspark transformations
Sharath Kumar Thungathurthisharaththungathurthi.hashnode.dev·Oct 19, 2024Unlock PySpark’s Power: Techniques for ParallelizingConceptual Questions What is parallelize in PySpark? parallelize is a method in PySpark used to convert a Python collection (like a list or a tuple) into an RDD (Resilient Distributed Dataset). This allows you to perform parallel processing on the ...1 likePySpark
Vaishnave Subbramanianvaishnave.page·Mar 21, 2024Dabbling with Spark Essentials AgainApache Spark stands out as a pivotal tool for navigating the complexities of Big Data analysis. This article embarks on a comprehensive journey through the core of Spark, from unraveling the intricacies of Big Data and its foundational concepts to ma...1 like·215 readsDabbling with Apache Sparkspark
Utsav Gohelgoutsav.hashnode.dev·Nov 4, 2023Apache SparkHere I am attaching a file of lab practice work that I have done in my lab session on big data working with spark. you can download those file by clicking the link Data Processing file (python file)Data set File What is Apache Spark? Apache Spark is ...1 like#apache-spark