cr88.hashnode.devGetting Started with Databricks Connect on AWS Using Serverless ComputeDatabricks Connect lets you write PySpark code locally in VS Code and execute it remotely on Databricks — no cluster management needed when using serverless compute. This post walks through the exact Feb 21·2 min read
cr88.hashnode.devExternal Tables vs Managed Tables in Databricks Unity Catalog: A Hands-On GuideIf you've ever wondered whether to let Databricks manage your data or keep control of it yourself, this post walks you through the exact commands. Step 1: Create the External Location Before creatingFeb 19·4 min read
cr88.hashnode.devData Vault Foundational ConceptsThe Problem with Traditional Data Warehouses Traditional data warehouses try to create clean truth immediately: merging records, updating dimensions, applying business rules, and overwriting old values. This approach breaks down in real-world scenari...Feb 7·5 min read
cr88.hashnode.devBuilding Event-Driven Data Pipelines with Lakeflow Declarative Pipelines, Lakeflow jobs, Service Principals, Databricks Asset Bundles and File ArrivalWhat We're Building Key Features: Service Principal — Automated identity for CI/CD and secure deployments File Arrival Trigger — Job runs only when new files land Managed File Events — SNS/SQS foJan 28·10 min read
cr88.hashnode.devSecuring Unity Catalog - Storage Credentials, Scoped Service Principal External Location access - serverless autoloader file notification file eventsOne of the most common challenges in Data Engineering is giving a pipeline access to just enough data without handing over the keys to the entire kingdom (or S3 bucket). In this post, we’ll walk throuJan 18·4 min read