May 11 · 4 min read · Act 1: The "Then vs. Now" – Why Delta? Back in the day, a Data Lake was just a folder of files. If two people wrote to it at once, the data became a scrambled egg. FeatureThe Parquet "Swamp" (Then)The
Join discussion
Mar 29 · 13 min read · TL;DR: Databricks organizes everything in a layered hierarchy — Account → Metastore → Workspace → Catalog → Schema → Table. Understanding how Workspaces, Bindings, and Permissions interact at each lay
Join discussionJan 9 · 8 min read · Si trabajas con Apache Spark en Microsoft Fabric, probablemente te hayas enfrentado a la complejidad de optimizar configuraciones, reducir costos y mejorar el rendimiento de tus workloads. Sparkwise es una librería de Python diseñada específicamente ...
Join discussionDec 15, 2025 · 6 min read · En el universo de los datos, la presencia de duplicados es casi una garantía. Desde registros de clientes que se repiten hasta transacciones que aparecen más de una vez, los datos duplicados son un problema silencioso que puede socavar la fiabilidad ...
Join discussion
Dec 11, 2025 · 8 min read · When you work with data in Fabric Lakehouse, you need to understand two fundamental table types: managed tables and external tables. This distinction affects how your data is stored, accessed, and what happens when you delete tables. This article exp...
Join discussion
Oct 29, 2025 · 8 min read · In the era of big data, data lakes became a popular choice for large-scale analytics, thanks to their flexibility, low cost, and separation of storage and compute. But they’ve also struggled with consistency, schema drift, and complex query optimizat...
Join discussion
Sep 25, 2025 · 19 min read · title: The Ultimate Guide to Open Table Formats - Iceberg, Delta Lake, Hudi, Paimon, and DuckLake date: "2025-09-24" description: "Understanding Iceberg, Delta Lake, Hudi, Paimon, and DuckLake" author: "Alex Merced" category: "Data Engineering" banne...
Join discussion
Sep 19, 2025 · 10 min read · 들어가기 전에 전 회사에서는 Databricks와 Delta Lake를 사용하다가, 현 회사에서 Apache Iceberg를 주로 사용 중입니다. 처음엔 익히 들었던 것처럼 “별 다른 점이 없는 스토리지 포맷 아닌가?" 하고 금방 적응할 수 있을 거라 생각했는데, 생각보다 주요한 점들이 달랐습니다. 평소에 학습할 때, 기존에 알고 있는 개념과 연관지어 이해하는 걸 좋아하는 편이라 개념을 정리할 겸 iceberg 와 Delta Lake 에 대해 ...
Join discussionJul 2, 2025 · 4 min read · Last week I was working on a Spark pipeline that was running slowly, and I discovered that a specific task with significant skew was the cause. Googling for the problem didn’t return any meaningful result, so I had to figure it out myself. Here’s wha...
Join discussion