BDBiju Devassyinbijudevassy.hashnode.dev·Mar 14 · 4 min readPartitioning vs Z-ORDER vs Liquid Clustering in Delta LakeModern data platforms built on Databricks and Delta Lake often store massive datasets in data lakes. Query performance in such environments depends heavily on how the data is physically organized. Thr00
BDBiju Devassyinbijudevassy.hashnode.dev·Mar 14 · 4 min readUnderstanding BASE in NoSQL Databases In large distributed systems such as NoSQL databases, it is difficult to guarantee immediate consistency across multiple servers located in different regions. To handle this challenge, many NoSQL syst00
BDBiju Devassyinbijudevassy.hashnode.dev·Mar 13 · 7 min readEvolution of Microsoft Data Integration Platforms: From SSIS to Microsoft FabricModern data platforms have evolved significantly over the past two decades. Microsoft’s ecosystem reflects this journey clearly—from traditional on-premise ETL tools to cloud-native analytics platform00
BDBiju Devassyinbijudevassy.hashnode.dev·Mar 5 · 5 min readSCD Types and SCD Type 2 Implementation in Databricks Using PySpark and Delta MergeIn transactional systems, updates overwrite history. In analytical systems, history is often more important than the current state. If a customer changes city or a product changes category, business u00
BDBiju Devassyinbijudevassy.hashnode.dev·Mar 4 · 6 min readAuto Loader Implementation in Databricks using PySparkBuilding Reliable, Incremental File Ingestion Pipelines In most data platforms, ingestion begins with files landing in cloud storage. CSV drops from legacy systems, JSON payloads from APIs, Parquet ex00