May 6 · 6 min read · match keys between two tables and boom, you get results. That mindset worked fine in SQL databases. Then I started working with Spark on large datasets and my jobs started failing, timing out, or grinding for hours. The reality: Spark join performanc...
Join discussionMar 24 · 9 min read · At FabCon Atlanta last week, the updated notebook Copilot for data engineering and data science was announced. It brings agentic capabilities to the Copilot and is much more intelligent and Fabric-awa
Join discussionMar 16 · 20 min read · Where We Left Off Part 1 of this series stood up the Azure infrastructure and Databricks workspace via Terraform. Part 2 added the Unity Catalog layer — catalog, schemas, and grants — managed through
Join discussionMar 12 · 4 min read · Start with a relatable junior struggle: "Ever stared at a Spark UI wondering why your 'simple' PySpark job is shuffling 100GB for a 1GB dataset? Or why Delta reads take forever? I did – for years. As
Join discussion
Mar 6 · 9 min read · Stack: AWS Glue · PySpark · Step Functions · APIs · Power BI Welcome to my very first technical blog post, I'm Omer and I love many things in life, but for now, you will know only two of them: detail
Join discussion
Mar 5 · 4 min read · The problem-When Migration Breaks What Already Works For years, daily sales orders flowed from system A into our pipeline without issues. I was on the downstream team — we'd receive the data, run it t
Join discussion