Victor Ndutidatacurious.hashnode.dev·Mar 4, 2024Crafting a Basic Data Pipeline with Airflow.From setup to mastery: A Guide to Crafting Your Inaugural DAG In a previous blog post, we explored the fundamental concepts of Apache Airflow—a versatile workflow management platform that empowers users to orchestrate complex data pipelines with eas...Discuss·118 readsapache
__thatpyjamagirlengineereddata.hashnode.dev·Feb 9, 2024Freelancing with DataFor the first time in my career, I am freelancing for a small startup. Documenting this journey as I go along. Its a small company trying to create a community of gamers and game developers and make a fortune by increasing game engagement. Where do I...DiscussData Science
bhuvanchand maddibhuvanchand.hashnode.dev·Jan 27, 2024Mastering Parallelism, Max Active Runs, and DAG Concurrency in Apache AirflowApache Airflow is an open-source tool widely used for orchestrating complex workflows. When it comes to managing the execution of multiple tasks and DAGs (Directed Acyclic Graphs), understanding three key parameters – parallelism, max_active_runs_per...Discussairflow
bhuvanchand maddibhuvanchand.hashnode.dev·Jan 27, 2024Understanding Start Date, Schedule Interval, and Execution Date in Apache AirflowApache Airflow is a powerful platform used for orchestrating complex computational workflows and data processing pipelines. At the heart of Airflow's scheduling system are three critical concepts: the start date (start_date), schedule interval (sched...Discussairflow
Aryan Gargblog.aryann.tech·Oct 12, 2023Why Postgres should be the last database you'll ever needBeing a sucker for reading unnecessary books in fields I have no experience in got me into flipping through the Google Site Reliability Engineering book, where I had read the most elegant concept that seems obvious at first but isn't applied in the r...Discuss·62 readsPostgreSQL
Kyle Sheltonchaoskyle.com·Aug 13, 2023Data Engineering for DevOps EngineersIntroduction Have you ever gone camping? If you have, then you know that it's important to have a plan. You need to know where you're going, what you're going to do, and what supplies you need. Data engineering is a lot like camping. You need to have...Discuss·28 readsData Science
Victor Chabachaba.hashnode.dev·Aug 4, 2023A Simple ETL Pipeline Automation Using Airflow on AWSAirflow is an open-source platform for creating, scheduling, and monitoring workflows. In this tutorial, we'll use Airflow to extract weather data from an API, transform the data, and load it into a CSV file in an S3 bucket. We'll start by creating a...Discussairflow
Flávio Regis Arrudaxboard.hashnode.dev·Jul 27, 2023Tip: set depends_on_past=True in Airflow when creating a forecast model pipeline💡 When creating forecasting model pipeline in Airflow set depends_on_past=True. Why? Forecasting models help us predict future data based on patterns and trends observed in historical data. These models output inherently depend on past observations...Discussairflow
Derrick Qinderrickqin.com·Jul 22, 2023Cloud Composer Airflow and BigQuery External Table with Google SheetsBigQuery has a useful feature which allows the user to create external table with data on Google Sheets. It is very convenient because BigQuery users can query the data from Google Sheets directly. However, as a data engineer, you may need to build p...Discusscloud composer
Tanupriya Singhtanupriya.com·Jul 14, 2023My Journey as a Machine Learning EngineerIntroduction One of the questions I get as a Machine Learning Engineer is whether I am required to read research papers and be aware of the latest algorithms and models. The answer is no. Don’t get me wrong, it benefits from knowing the various model...Discuss·2 likes·677 readsMachine Learning