mehulkansal.hashnode.dev25: Netflix Data Pipeline with dbt and Snowflake 馃幀Hey Data Engineers! 馃憢 In this project, we build an end-to-end data pipeline using dbt (Data Build Tool) and Snowflake, simulating a Netflix-style dataset. Starting from raw data in Amazon S3, we load it into Snowflake, apply transformations using db...Sep 2, 2025路7 min read
mehulkansal.hashnode.dev24: AWS Glue Building Blocks 馃ПHey fellow learners! 馃憢 This blog demonstrates how to work with AWS Glue through practical, real-world use cases. From setting up crawlers and managing schema changes to building ETL pipelines with transformations and handling sensitive data, each se...May 31, 2025路7 min read
mehulkansal.hashnode.dev23: Amazon EMR and EMR Serverless Guide 馃摌Hey readers! 馃憢 In this blog, I walk through my hands-on journey with Amazon EMR鈥攕tarting from creating a traditional EMR cluster to leveraging the flexibility of EMR Serverless for both batch and interactive workloads. From provisioning resources an...May 24, 2025路6 min read
mehulkansal.hashnode.dev22: AWS Athena Setup and Optimization 馃搳Hi Data Folks! 馃憢 In this blog, we鈥檒l walk through setting up AWS Athena for querying data in S3, defining table structures using both manual metadata and Glue Crawlers, and optimizing query performance with techniques like partitioning and columnar ...May 11, 2025路6 min read
mehulkansal.hashnode.dev21: Real-Time Data Streaming with Apache Kafka 馃摗Hey data folks! 馃憢 Apache Kafka offers a robust ecosystem for handling high-speed data streams and is widely used in various industries for applications such as event-driven architectures, real-time analytics, and data integration. In this blog, we w...Jan 27, 2025路4 min read