Sharath Kumar Thungathurthisharaththungathurthi.hashnode.dev·Oct 19, 2024How to Perform Efficient Data Transformations Using PySparkHere are some common interview questions and answers related to transformations in Spark: 1. What are narrow and wide transformations in Spark? Answer: Narrow transformations are transformations where each partition of the parent RDD is used to produ...36 readspyspark transformations
Sharath Kumar Thungathurthisharaththungathurthi.hashnode.dev·Oct 19, 2024Unlock PySpark’s Power: Techniques for ParallelizingConceptual Questions What is parallelize in PySpark? parallelize is a method in PySpark used to convert a Python collection (like a list or a tuple) into an RDD (Resilient Distributed Dataset). This allows you to perform parallel processing on the ...1 likePySpark
Shreyash Banteshreyash27.hashnode.dev·Aug 26, 2024Understanding the Spark Execution ModelApache Spark's execution model is one of the reasons it stands out as a powerful tool for big data processing. At its core, Spark's execution model revolves around two main concepts: transformations and actions. To understand how Spark operates, it’s...#rdd
Renjitha Krenjithak.hashnode.dev·Mar 30, 2023Demystifying Big Data Analytics with Apache Spark : Part-1Posted by Renjitha K in Renjitha K's Blog on Mar 25, 2023 2:27:13 PM As the amount of data generated by individuals and businesses continue to grow exponentially, the need for technologies like Apache Spark that can process and analyze large dataset...2 likes·115 readsspark
Yash Srivastavablog.yashsrivastava.link·Jan 18, 2023Basic Spark RDD transformationsRDD(resilient distributed datasets) are the basic unit of storage in spark. you can think of an rdd as a collection distributed over multiple machines.Most of the time higher level structured APIs are used in spark applications which under the hood g...spark