KDKarthik Darbhaintech4nirvana.com·3d ago · 11 min readAI-Driven Data Quality: From Rules to ReasoningEvery data engineering team has a version of the same story. A critical dashboard starts showing numbers that don't feel right. An analyst flags it Friday afternoon. The on-call engineer traces it bac00
VDVishnu Dinthedatatrench.hashnode.dev·May 29 · 9 min readSpark Architecture Simply ExplainedYou have been using Spark for months -- running notebooks, submitting jobs, reading docs. But when someone asks you to explain what actually happens when a job runs, you find yourself stalling. The co00
VHVarchasv Hooninvarchasvh.hashnode.dev·May 26 · 5 min readDay 1 : Apache Spark InternalsIf you want to become a true PySpark SME, you need to go beyond writing transformations—you must understand how Spark actually executes them under the hood. In this post, we’ll break down Spark’s exec00
TTTrung Thànhinthanh-de.hashnode.dev·May 6 · 6 min readI spent 6 hours studying PySpark join strategies. Here's what I learnedmatch keys between two tables and boom, you get results. That mindset worked fine in SQL databases. Then I started working with Spark on large datasets and my jobs started failing, timing out, or grinding for hours. The reality: Spark join performanc...00
SSShivankur Sharmainshivankur018.hashnode.dev·Apr 27 · 4 min readCelebal Internship – Weekly Learning JournalWeek 1: Basics of Data The first week focused on understanding data fundamentals and how modern systems handle data. Data was introduced as raw facts, while information is processed data with meaning.10
KDKarthik Darbhaintech4nirvana.com·Apr 22 · 18 min readMigrating SPC Run Rules from SAS to DatabricksA Pharma Supply Chain Engineering Perspective · tech4nirvana.com Why This Migration Is Non-Trivial Earlier, I worked as Product Owner and Data Architect on a SAS to Databricks migration for a Pharma 00
SPSandeep Pawarinfabric.guru·Mar 24 · 9 min readCross-referencing Notebooks In The Updated Fabric Notebook CopilotAt FabCon Atlanta last week, the updated notebook Copilot for data engineering and data science was announced. It brings agentic capabilities to the Copilot and is much more intelligent and Fabric-awa10
Nnobhriinplatform-notes.hashnode.dev·Mar 16 · 20 min readTerraform & Databricks CI/CD Part 3: The Design Decisions Behind the Job LayerWhere We Left Off Part 1 of this series stood up the Azure infrastructure and Databricks workspace via Terraform. Part 2 added the Unity Catalog layer — catalog, schemas, and grants — managed through 00
SRShahida R. Khaninmodern-data.hashnode.dev·Mar 12 · 4 min readPySpark + Databricks + Delta Lake: 7 Battle-Tested Patterns to Stop Wasting Hours (And Dollars) – Junior-Friendly GuideStart with a relatable junior struggle: "Ever stared at a Spark UI wondering why your 'simple' PySpark job is shuffling 100GB for a 1GB dataset? Or why Delta reads take forever? I did – for years. As 00
ÖOÖmer Oruç ÇELİKinoorucelik.hashnode.dev·Mar 6 · 9 min readFrom 3,600 to 400 API Calls: Optimizing PySpark on AWS Glue with the Yield PatternStack: AWS Glue · PySpark · Step Functions · APIs · Power BI Welcome to my very first technical blog post, I'm Omer and I love many things in life, but for now, you will know only two of them: detail00