Anix Lynchgozeroshot.dev·Jan 20, 2025Automated podcast processing pipeline w/airflow, vosk, pydub, sqlite 🗣️This is an automated podcast processing pipeline that: Downloads podcast episodes from Marketplace (a business news podcast) Stores episode info in a database Converts speech to text (transcription) Does all this automatically on a daily schedule...Python Projectsairflow
Renjitha Krenjithak.hashnode.dev·Jan 18, 2025Mastering Slowly Changing Dimensions (SCD): A Guide Using Vendor FeedsWhat is Slowly Changing Dimensions (SCD)? So, let’s say you’re dealing with a data warehouse—a massive storehouse of information that helps businesses make sense of complex data. Now, Slowly Changing Dimensions (SCD) is a method used in data warehous...dataengineering
KUNALkunaltheengineer.hashnode.dev·Jan 11, 2025Deploying Apache Airflow with Docker on Ubuntu EC2 InstanceIntroduction Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. Deploying Airflow using Docker simplifies the setup process and ensures a consistent environment. This guide provides a detailed step-...10 likesAWSairflow
Stephen David-Williamsstephendavidwilliams.com·Dec 30, 2024Use data contracts to automate data workflows - part 2Preface📖 In part 1, we explained what a data contract is why we need them, and what a typical one contains In this blog, we dive into a demo to explore how they actually work so we can process data safer, faster and effectively. Goal🎯 The data...data-engineering
Renjitha Krenjithak.hashnode.dev·Dec 29, 2024Streamlining Data Workflows: The Essential Guide to Apache AirflowImagine you're juggling multiple tasks—downloading data from APIs, cleaning it up, running complex transformations, and finally pushing it into dashboards or machine learning models. Doing this manually can feel like herding cats, where every little ...53 readsairflow
Vipinvipinmp.hashnode.dev·Dec 29, 2024Setting Up an ETL Workflow with Apache Airflow to Analyze E-Commerce Sales DataIn the data-driven world of e-commerce, analyzing sales data is crucial for making informed business decisions. Apache Airflow, a powerful open-source platform, allows you to automate and orchestrate complex workflows. In this blog, we'll guide you t...36 readsE2E ProjectsSQL
Vipinvipinmp.hashnode.dev·Dec 29, 2024Accessing PostgreSQL Using pgAdmin with a Dockerized Apache Airflow SetupIn this blog, i will guide you through the process of accessing your PostgreSQL database using pgAdmin in a Dockerized Apache Airflow setup. pgAdmin is a powerful and user-friendly tool that allows you to manage and interact with your PostgreSQL data...Python
Varun KumawatforDevHubdevhubcommunity.hashnode.dev·Dec 25, 2024Step-by-Step Guide: Running Apache Airflow in DockerApache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. Docker simplifies the setup process by providing an isolated and reproducible environment. This guide walks you through setting up Apache Airf...1 like·42 readsairflow
Varun KumawatforDevHubdevhubcommunity.hashnode.dev·Dec 24, 2024Step-by-Step Guide: Build Your First Local Airflow DAG in Just 5 MinutesApache Airflow is an open-source platform used to schedule and manage workflows. With Airflow, you can automate complex workflows such as data ingestion, ETL processes, and machine learning pipelines. In this blog, we’ll walk through the steps to set...1 like·36 readsairflow
Sai Prasanna Maharanasaimaharana.hashnode.dev·Oct 26, 2024Airflow: An IntroductionWhat is Apache Airflow? Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It allows you to create dynamic, extensible, and scalable workflows as code, ensuring maintainability, versioning, testing,...MLOPSairflow