thedatapipeline.hashnode.devStep-by-Step Guide to Setting Up Medallion Architecture on AWSModern analytics platforms require structured and reliable data processing pipelines. The Medallion Architecture—Bronze → Silver → Gold—provides a standardized way to achieve this while maintaining quality, lineage, and re-processing capabilities. In...Dec 7, 2025·3 min read
thedatapipeline.hashnode.devEfficiently Load Millions of Rows Daily from S3 into AWS RedshiftA few months ago I built a pipeline for a logistics analytics team that collects package events—delivery scans, route status updates, warehouse entries, etc. The events come from 11 distributed warehouses across India, aggregating to ~40M records/day...Dec 6, 2025·4 min read
thedatapipeline.hashnode.devMust-Know AWS Glue Interview Questions and AnswersWhat is AWS Glue Crawlers? A Glue crawler is simply a service that scans your data source—mostly S3 in data lake setups—and automatically figures out the schema and creates tables inside the Glue Data Catalog. It can detect new partitions and even up...Dec 5, 2025·6 min read
thedatapipeline.hashnode.devTop Snowflake Interview Questions for Data Engineers1. Explain Snowflake Architecture? Snowflake uses a fully decoupled architecture consisting of three independent layers: Storage, Compute, and Cloud Services. The Storage layer holds all data in compressed, columnar structures, internally broken into...Dec 4, 2025·4 min read
thedatapipeline.hashnode.devHow to Implement Stream-Based SCD Type 2 in SnowflakeSlowly Changing Dimensions (SCD) Type-2 is the go-to pattern when you need full historical tracking of your dimensional tables. But running a naïve column-by-column comparison becomes messy fast. A much cleaner way? 👉 Create an MD5 hash of all busin...Dec 3, 2025·2 min read