SRShahida R. Khaninmodern-data.hashnode.devPySpark + Databricks + Delta Lake: 7 Battle-Tested Patterns to Stop Wasting Hours (And Dollars) – Junior-Friendly GuideStart with a relatable junior struggle: "Ever stared at a Spark UI wondering why your 'simple' PySpark job is shuffling 100GB for a 1GB dataset? Or why Delta reads take forever? I did – for years. As 1d ago·4 min read
ÖOÖmer Oruç ÇELİKindetail-pipelines.hashnode.devFrom 3,600 to 400 API Calls: Optimizing PySpark on AWS Glue with the Yield PatternStack: AWS Glue · PySpark · Step Functions · APIs · Power BI Welcome to my very first technical blog post, I'm Omer and I love many things in life, but for now, you will know only two of them: detailMar 6·9 min read
NNevedhaAyyanarinnevedhaayyanar.hashnode.devAI Meets Data Engineering: Building an Intelligent Data Quality Agent on Microsoft FabricThe problem-When Migration Breaks What Already Works For years, daily sales orders flowed from system A into our pipeline without issues. I was on the downstream team — we'd receive the data, run it tMar 5·4 min read
BDBiju Devassyinbijudevassy.hashnode.devCaching vs Persistence in Spark (PySpark)Introduction Apache Spark is built on lazy evaluation. Transformations such as select, filter, join, and groupBy do not execute immediately. Instead, Spark builds a logical plan (DAG) and executes it Feb 22·5 min read
SSSameer Shuklainfreecodecamp.orgHow to Optimize PySpark Jobs: Real-World Scenarios for Understanding Logical PlansIn the world of big data, performance isn't just about bigger clusters – it's about smarter code. Spark is deceptively simple to write but notoriously difficult to optimize, because what you write isn't what Spark executes. Between your transformatio...Feb 5·70 min read
SDSai Dinesh Kondainlearningdataengineering.hashnode.devFrom Software Testing to Data Engineering: My Learning JourneyIntroduction I’m currently working as a Software Tester, and I’m starting this blog to document my journey into Data Engineering. This is not a blog written by an expert.It’s written by someone who is learning seriously, step by step, and wants to un...Jan 24·3 min read
KBKilian Baccaro Salinasindatagym.esCómo recuperar Workspaces eliminados en Microsoft Fabric¿Alguna vez has eliminado accidentalmente un workspace en Microsoft Fabric y has sentido ese momento de pánico? No te preocupes, Microsoft ha incorporado una API de administración que te permite restaurar workspaces eliminados. En este artículo te mo...Jan 16·4 min read
KBKilian Baccaro Salinasindatagym.esSparkwise: Optimización Inteligente para Apache Spark en Microsoft FabricSi trabajas con Apache Spark en Microsoft Fabric, probablemente te hayas enfrentado a la complejidad de optimizar configuraciones, reducir costos y mejorar el rendimiento de tus workloads. Sparkwise es una librería de Python diseñada específicamente ...Jan 9·8 min read
TKTuhin Kumar Duttaintechtrail.tuhindutta.comZenalyze: My AI-Assisted Data Analysis Tool (And Why I Built It)Most AI “data analysis” tools today fall into two groups: They pretend to analyze your data but don’t actually run code. They demand you upload your data to some cloud black box. Neither works for real-world analytics. I wanted something differen...Nov 17, 2025·7 min read
AKAbhishek Kumarinabhi1213.hashnode.devThe Ultimate SQL vs PySpark Syntax GuideSample Dataset (Master Table for Entire Blog) We’ll use two tables because many SQL operations (joins, aggregates, window functions) need multiple datasets. EMPLOYEES Table emp_idnameagedepartmentsalaryjoin_date 1John30HR500002020-01-15 2Smit...Nov 16, 2025·20 min read