akshobya.hashnode.devGetting Started with PySpark: A Beginner's GuideWhat is PySpark? PySpark is the Python API for Apache Spark, an open-source, distributed computing system used for big data processing and machine learning. It allows you to harness the speed and scalability of Spark while coding in Python. Why Use P...Jan 28, 2025·2 min read
akshobya.hashnode.devIntroduction to Azure Data Lake Storage Gen2 (ADLS Gen2)In today’s data-driven world, the volume of unstructured and structured data continues to grow at an exponential rate. Azure Data Lake Storage Gen2 (ADLS Gen2) is a solution designed to handle this surge in data. With the scalability of a data lake a...Oct 29, 2024·2 min read
akshobya.hashnode.devPart 1: Introduction to Azure Data Factory (ADF)As organizations continue to generate massive amounts of data, the need to move, transform, and integrate that data becomes critical. This is where Azure Data Factory (ADF) comes in. ADF acts as the backbone for cloud based ETL (Extract, Transform, L...Oct 21, 2024·3 min read
akshobya.hashnode.devIntroduction to Microsoft Azure: A Gateway to the CloudWhen it comes to cloud computing, Microsoft Azure is one of the leading names we encounter. With solutions spanning virtual machines to AI-powered analytics, Azure has grown into a powerhouse, helping organizations of all sizes transform how they ope...Oct 17, 2024·3 min read
akshobya.hashnode.devPart 2: Exploring Right Joins, Full Outer Joins, and Self JoinsIn the previous post, we got comfortable with inner joins and left joins. we will cover the remaining join types right join, full outer join, and self join which help us handle more complex data relationships. Let’s get started! 1. Right Join A right...Oct 16, 2024·3 min read