Mehul Kansalmehulkansal.hashnode.dev·Jul 22, 2024Week 10: Lending Club Project - Part 2 💸Hey there! 👋 Welcome to the second and final part of our Lending Club project. In this blog, we focus on calculating loan scores based on loan payment history, financial health, and loan defaulters' history. We'll also create permanent and temporary...DiscussPySpark
Mehul Kansalmehulkansal.hashnode.dev·Jul 15, 2024Week 9: Lending Club Project - Part 1 💸Hey there, data engineering folks! 👋 In this two-part series on the Lending Club project, we delve into the process of creating, cleaning and transforming various datasets derived from a large dataset containing over 2 million records. The first par...Discussdataengineering
Rahul Rathodcodeok.hashnode.dev·Jul 14, 2024The Revolutionary Journey of Apache Spark: From Academic Roots to Industry DominanceIn the world of big data, speed and efficiency are paramount. Among the many technologies that have emerged to address these needs, Apache Spark stands out as a revolutionary force. Born from academic innovation and nurtured by a growing community, S...Discussapache
Mehul Kansalmehulkansal.hashnode.dev·Jul 8, 2024Week 8: Spark Performance Tuning 🎶Hey data enthusiasts! 👋 In this week's blog, we delve into Spark Performance Tuning, focusing on optimizing aggregate operations and understanding the intricacies of Spark's logical and physical plans. We explore how sort and hash aggregations diffe...Discussspark
Shahnawaz Khanshahnawaz.hashnode.dev·Jul 7, 2024Importance of Data AnalystData-Driven Decision Making Data analysts help organizations make informed decisions by analyzing data trends and patterns. This can range from improving operational efficiency to identifying new business opportunities. Business Insights They extract...Discussdataanalytics
Mehul Kansalmehulkansal.hashnode.dev·Jul 1, 2024Week 7: Spark Optimization Unlocked 🔓Hello fellow data engineers! This week, we delve into intricacies of Apache Spark optimizations, exploring how transformations like groupBy(), join types, partitioning, and adaptive query execution (AQE) enhance the performance and efficiency of data...Discuss·28 readsspark
Mehul Kansalmehulkansal.hashnode.dev·Jun 24, 2024Week 6: Spark Internals Demystified 🔮Hey there, fellow data enthusiasts! This week, we will explore the intricacies of Spark Internals, from DataFrame Writer API and various write modes to advanced partitioning and bucketing techniques. We will discover how to optimize query performance...Discussspark
Mehul Kansalmehulkansal.hashnode.dev·Jun 17, 2024Week 5: PySpark Playground - Aggregates and Windows 🏀Hey there, fellow data engineers! This week's blog aims to delve into the various methods of accessing columns in PySpark, explore the different types of aggregate functions, and understand the utility of window functions. By the end of this post, yo...DiscussData Science
Mehul Kansalmehulkansal.hashnode.dev·Jun 3, 2024Week 3: Spark Transformations - Navigating Schema and Data Types 🧭Introduction Welcome back, data enthusiasts! This week, we'll unravel the intricacies of schema inference and enforcement, data type handling, creating and refining dataframes, and removing duplicates. Let's get started right away! Schema inference, ...Discuss·26 readsdata-engineering
Harshita Chaudharyharshita.hashnode.dev·May 30, 2024Slowly Changing Dimensions with PySpark and Delta LakeSlowly Changing Dimensions (SCDs) are a vital concept in data warehousing, particularly in managing data that changes over time. As the entities evolve over time, it’s crucial to track and manage these changes effectively. This is where Slowly Changi...Discussdata-engineering