Skip to main content
CF

Apache Airflow Workflow Orchestration

1h 18m 41s
English
Paid

Apache Airflow is a versatile, platform-independent tool for workflow orchestration, offering extensive capabilities for creating and monitoring both streaming and batch pipelines. With its comprehensive features, even the most complex processes can be implemented seamlessly. Airflow is supported by key platforms and tools in the Data Engineering world, such as AWS and Google Cloud.

Airflow not only provides scheduling and management of processes but also enables real-time tracking of job execution, allowing for swift identification and resolution of errors.

In brief: Airflow is currently one of the most in-demand and "hyped" tools in pipeline orchestration. It is widely adopted by companies globally, and knowledge of Airflow is fast becoming an essential skill for data engineers, particularly for students beginning their career in this field.

Basic Concepts of Airflow

This section introduces you to the fundamentals of working with Airflow. You will learn how DAGs (Directed Acyclic Graphs) are created, what they consist of (operators, tasks), and how the architecture of Airflow is structured, including the database, scheduler, and web interface. We will also examine examples of event-driven pipelines that can be implemented using Airflow.

Installation and Environment Setup

In this practical module, you will work on a project involving weather data processing. The DAG will fetch data from a weather API, transform it, and store it in a Postgres database. You will gain skills in:

  • Configuring the environment using Docker;
  • Verifying the web interface and container operations;
  • Configuring the API and creating the necessary tables in the database.

Practice: Creating DAGs

In this hands-on practice session, you will delve into the Airflow interface and learn to monitor task statuses effectively. You will:

  • Create DAGs based on Airflow 2.0 that retrieve and process data;
  • Master the Taskflow API—a modern approach to building DAGs with more convenient syntax;
  • Implement parallel task execution (fanout) to run multiple processes simultaneously.

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 21 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 21 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (21)
#Lesson TitleDurationAccess
1
Introduction Demo
01:37
2
Airflow Usage
03:20
3
Fundamental Concepts
02:48
4
Airflow Architecture
03:10
5
Example Pipelines
04:50
6
Spotlight 3rd Party Operators
02:18
7
Airflow XComs
04:33
8
Project Setup
01:44
9
Docker Setup Explained
02:07
10
Docker Compose & Starting Containers
04:24
11
Checking Services
01:49
12
Setup WeatherAPI
01:34
13
Setup Postgres DB
01:59
14
Airflow Webinterface
04:38
15
Creating DAG With Airflow 2.0
09:47
16
Running our DAG
04:16
17
Creating DAG With TaskflowAPI
07:00
18
Getting Data From the API With SimpleHTTPOperator
03:39
19
Writing into Postgres
04:13
20
Parallel Processing
04:16
21
Recap & Outlook
04:39
Unlock unlimited learning

Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Course content

21 lessons · 1h 18m 41s
Show all 21 lessons
  1. 1 Introduction 01:37
  2. 2 Airflow Usage 03:20
  3. 3 Fundamental Concepts 02:48
  4. 4 Airflow Architecture 03:10
  5. 5 Example Pipelines 04:50
  6. 6 Spotlight 3rd Party Operators 02:18
  7. 7 Airflow XComs 04:33
  8. 8 Project Setup 01:44
  9. 9 Docker Setup Explained 02:07
  10. 10 Docker Compose & Starting Containers 04:24
  11. 11 Checking Services 01:49
  12. 12 Setup WeatherAPI 01:34
  13. 13 Setup Postgres DB 01:59
  14. 14 Airflow Webinterface 04:38
  15. 15 Creating DAG With Airflow 2.0 09:47
  16. 16 Running our DAG 04:16
  17. 17 Creating DAG With TaskflowAPI 07:00
  18. 18 Getting Data From the API With SimpleHTTPOperator 03:39
  19. 19 Writing into Postgres 04:13
  20. 20 Parallel Processing 04:16
  21. 21 Recap & Outlook 04:39

Related courses

  • Apache Iceberg Fundamentals thumbnailUpdated 7mo ago

    Apache Iceberg Fundamentals

    By: David Reger
    Unlock the potential of modern data platforms with Apache Iceberg, which masterfully combines the flexibility of data lakes with the reliability of data.
    33 minutes 32 seconds
  • Platform & Pipeline Security thumbnailUpdated 11mo ago

    Platform & Pipeline Security

    By: Andreas Kretz
    Empower your data engineering skills by understanding the critical importance of security .
    34 minutes 46 seconds
  • Data Engineering on Databricks thumbnailUpdated 11mo ago

    Data Engineering on Databricks

    By: Andreas Kretz
    Learn Databricks for data processing using Apache Spark. This course covers setup on AWS, ETL processes, data visualization, and BI tools integration.
    1 hour 27 minutes 29 seconds 5 / 5

Frequently asked questions

What is Apache Airflow Workflow Orchestration about?
Apache Airflow is a versatile, platform-independent tool for workflow orchestration , offering extensive capabilities for creating and monitoring both streaming and batch pipelines. With its comprehensive features, even the most complex…
Who teaches this course?
It is taught by Andreas Kretz. You can find more courses by this instructor on the corresponding source page.
How long is the course?
It contains 21 lessons with a total runtime of 1 hour 18 minutes. Every lesson is available to watch online at your own pace.
Is it free to watch?
It is part of CourseFlix's premium catalog. A subscription unlocks the full video player; the course description, table of contents, and preview information are available to everyone.
Where can I watch it online?
The course is available to watch online on CourseFlix at https://courseflix.net/course/apache-airflow-workflow-orchestration. The page hosts every lesson with the integrated video player; no download is required.