Skip to main content

Apache Airflow Workflow Orchestration

1h 18m 41s
English
Paid

Apache Airflow is a versatile, platform-independent tool for workflow orchestration, offering extensive capabilities for creating and monitoring both streaming and batch pipelines. With its comprehensive features, even the most complex processes can be implemented seamlessly. Airflow is supported by key platforms and tools in the Data Engineering world, such as AWS and Google Cloud.

Airflow not only provides scheduling and management of processes but also enables real-time tracking of job execution, allowing for swift identification and resolution of errors.

In brief: Airflow is currently one of the most in-demand and "hyped" tools in pipeline orchestration. It is widely adopted by companies globally, and knowledge of Airflow is fast becoming an essential skill for data engineers, particularly for students beginning their career in this field.

Basic Concepts of Airflow

This section introduces you to the fundamentals of working with Airflow. You will learn how DAGs (Directed Acyclic Graphs) are created, what they consist of (operators, tasks), and how the architecture of Airflow is structured, including the database, scheduler, and web interface. We will also examine examples of event-driven pipelines that can be implemented using Airflow.

Installation and Environment Setup

In this practical module, you will work on a project involving weather data processing. The DAG will fetch data from a weather API, transform it, and store it in a Postgres database. You will gain skills in:

  • Configuring the environment using Docker;
  • Verifying the web interface and container operations;
  • Configuring the API and creating the necessary tables in the database.

Practice: Creating DAGs

In this hands-on practice session, you will delve into the Airflow interface and learn to monitor task statuses effectively. You will:

  • Create DAGs based on Airflow 2.0 that retrieve and process data;
  • Master the Taskflow API—a modern approach to building DAGs with more convenient syntax;
  • Implement parallel task execution (fanout) to run multiple processes simultaneously.

About the Author: Andreas Kretz

Andreas Kretz thumbnail

I am a senior data engineer and trainer, a tech enthusiast, and a father. For more than ten years, I have been passionate about Data Engineering. Initially, I became a self-taught data engineer and then led a team of data engineers at a large company. When I realized the great demand for education in this field, I followed my passion and founded my own Data Engineering Academy. Since then, I have helped over 2,000 students achieve their goals.

Watch Online 21 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 21 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (21)
#Lesson TitleDurationAccess
1
Introduction Demo
01:37
2
Airflow Usage
03:20
3
Fundamental Concepts
02:48
4
Airflow Architecture
03:10
5
Example Pipelines
04:50
6
Spotlight 3rd Party Operators
02:18
7
Airflow XComs
04:33
8
Project Setup
01:44
9
Docker Setup Explained
02:07
10
Docker Compose & Starting Containers
04:24
11
Checking Services
01:49
12
Setup WeatherAPI
01:34
13
Setup Postgres DB
01:59
14
Airflow Webinterface
04:38
15
Creating DAG With Airflow 2.0
09:47
16
Running our DAG
04:16
17
Creating DAG With TaskflowAPI
07:00
18
Getting Data From the API With SimpleHTTPOperator
03:39
19
Writing into Postgres
04:13
20
Parallel Processing
04:16
21
Recap & Outlook
04:39
Unlock unlimited learning

Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription