Soyoola Sodunkesoyoolasodunke.hashnode.dev·Feb 12, 2025PySpark RDD Cheat SheetThis cheat sheet provides a quick reference to the most commonly used PySpark RDD operations. PySpark RDDs (Resilient Distributed Datasets) are the fundamental data structure in Apache Spark, providing fault-tolerant, distributed data processing capa...sparksql
Dhruvi Shahdhruvi-shah.hashnode.dev·Dec 31, 2024The Secret Life of Protobuf: The Fast, Small, and Mighty Data Format! 🚀Picture this: you’re packing for a vacation ✈️, and instead of neatly folding your clothes, you’re stuffing them into a suitcase with no organization just random piles of socks, shirts, and shoes. That’s what JSON and XML do to your data, stuffing it...1 likedata transformation
Mohan Dubeyand-this-is-how-fired-myself.hashnode.dev·Dec 24, 2024...And This Is How I Fired MyselfPicture This: Founders, a reporting manager, and a team of 40+ people, all relying on daily reports and data sheets manually prepared by running complex SQL queries. Some queries took over a minute to execute, clogging the database and slowing down d...SQL
Isuri Balasooriyathemathlab.hashnode.dev·Dec 19, 2024Beginner's Introduction to Power Query in ExcelIn the previous articles in the Excel series, we looked at the most basic features and tools available in Excel for data analysis. We looked at functions, pivot tables and pivot charts, data formatting techniques and charts. I hope you managed to pra...excel
Anix Lynchgozeroshot.dev·Dec 6, 2024Python Automation #2: 🗳️ Data Transformation w/polars, pyjanitor, pandas, polars1. Convert Column Names to Snake Case (pyjanitor.clean_names) import pandas as pd import janitor # Sample DataFrame df = pd.DataFrame({"Column Name 1": [1, 2], "AnotherColumn": [3, 4]}) # Convert column names to snake_case df = janitor.clean_names(...pyjanitor
Arpit Tyagidataminds.hashnode.dev·Dec 2, 2024Mastering Slowly Changing Dimensions (SCD) "Type 2" with Azure Data Factory: A Step-by-Step GuideIntroduction to Slowly Changing Dimensions (SCD) Type 2 Slowly Changing Dimensions (SCD) Type 2 is a data warehousing technique used to track historical changes in dimension data over time. Unlike SCD Type 1, which overwrites old data, Type 2 preserv...Azure Data FactoryAzure
Arpit Tyagidataminds.hashnode.dev·Dec 2, 2024Mastering Slowly Changing Dimensions (SCD) Type 1 with Azure Data Factory: A Step-by-Step Guide(SCD Type 1 implementation via ADF) Step 1: Setting Up Your Azure SQL Database for SCD Type 1. Create the emp_scdtype1 table in Azure SQL Database. Step 2: Populating Your Table: Adding Initial Data Entries. Step 3: Visualizing Data: Confirming Tab...8 likesAzure Data FactoryAzure
Arpit Tyagidataminds.hashnode.dev·Dec 2, 2024Mastering DataFlow Techniques in Azure Data Factory with a Data Transformation example:Step 1: Exploring the Data Lake: Initial File Inspection Step 2: Dataflow Blueprint: A Snapshot of the Transformation Process Step 3: Connecting the Dots: Linking to Your Data Source Step 4: Filtering the Blues: Excluding Specific Data Entries St...5 likesAzure Data FactoryAzure
Arpit Tyagidataminds.hashnode.dev·Dec 2, 2024Azure Data Factory: "Join" 2 or more CSV Files and Convert to JSON FormatStep 1: Inspecting the CSV Files in Data Lake: Your First Step to Data Optimization Step 2: Configuring the Data Flow Sources: Pointing to the Customer.CSV Files and use Join tool after that. Step 3: Use Join on Customer id as that is the common fi...5 likesAzure Data FactoryADF
Anastasia Zaharievawhenmathmetdata.hashnode.dev·Nov 27, 2024Day 10: Data TransformationWelcome to Day 10! Today, we’re diving into data transformation, an essential step to prepare raw data for analysis and machine learning. Data transformation includes scaling, normalizing, encoding, and reshaping data, ensuring it’s in the optimal fo...30 Days Data Science ChallengePython