Vaishnave Subbramanianvaishnave.page·Mar 21, 2024Dabbling with Spark Essentials AgainApache Spark stands out as a pivotal tool for navigating the complexities of Big Data analysis. This article embarks on a comprehensive journey through the core of Spark, from unraveling the intricacies of Big Data and its foundational concepts to ma...Discuss·55 readsDabbling with Apache Sparkspark
Vaishnave Subbramanianvaishnave.page·Mar 16, 2024Dabbling with Spark EssentialsEmbarking on the journey of understanding Apache Spark marks the beginning of an exciting series designed for both newcomers and myself, as we navigate the complexities of big data processing together. Apache Spark, with its unparalleled capabilities...Discuss·162 readsDabbling with Apache Sparkspark
Pinak Dattapinakdatta.hashnode.dev·Mar 15, 2024Scalable Data Processing with Apache Spark and PySparkIntroduction: In the era of big data, processing large volumes of data efficiently has become essential for many organizations. Apache Spark has emerged as a powerful tool for scalable data processing, offering speed, ease of use, and flexibility. In...DiscussPython
Kinyanjui Karanjaoverflow.hashnode.dev·Mar 12, 2024Loading, Transforming, and Saving GitHub Archive Data with PySparkIntroduction: GitHub Archive provides a wealth of data capturing various activities on the GitHub platform, such as repository creation, issues opened, and pull requests made. In this blog post, we'll explore how to use PySpark, a powerful analytics ...DiscussPySpark
Kiran ReddyforDatabricks - PySparkdatabricks-pyspark-blogs.hashnode.dev·Feb 27, 2024Reading different files in PySparkIntroduction In this blog, we'll explore the versatile capabilities of Apache Spark with PySpark for reading, writing, and processing data in Databricks environments. From handling various file formats to seamlessly integrating with external data sou...Discuss·10 likesDatabricks
KALINGA SWAINkalingaswain.hashnode.dev·Feb 11, 2024EMR with EKSHi, welcome to the event! Amazon EMR is like the Rockstar of cloud big data. Picture this: petabyte-scale data parties, interactive analytics shindigs, and even machine learning raves—all happening with cool open-source crews like Apache Spark, Apach...Discuss#AWSConsole
Malavikaviksmals.hashnode.dev·Feb 11, 2024Running Vulnerability Scans for Spark Third Party PackagesIf you use Spark in your codebase, chances are you also use some popular third-party packages to work with Spark. What does this mean from a security perspective? Your application may have some security vulnerabilities introduced due to these third-p...DiscussSpark For Data Science
Máté Márk Csörnyeicsornyei.com·Jan 7, 2024Building a Movie Recommendation Engine: PilotAnd who am I? Welcome! My name is Máté and I'm a 27-years-old Software Developer from Hungary and I'm currently living in the Netherlands. In the last 5 years I have had the chance to work on the frontend of an agricultural webshop, dive deep into th...Discuss·63 readsPython
Harshita Chaudharyharshita.hashnode.dev·Dec 18, 2023PySpark Job Optimization Techniques (Part - II )1. Broadcast Join When dealing with the challenge of joining a larger DataFrame with a smaller one in PySpark, the conventional Spark join operation can become resource-intensive in terms of both memory and time. This is particularly evident when the...Discuss·41 readsdata-engineering
Harshita Chaudharyharshita.hashnode.dev·Nov 8, 2023Spark's Execution PlanSpark's Execution Plan is a series of operations carried out to translate SQL statements into a set of logical and physical operations. In short, it represents a sequence of operations executed from the SQL statement to the Directed Acyclic Graph (DA...Discuss·27 readsPySpark