Skip to main content

The Data Engineering Bootcamp: Zero to Mastery

16h 46m 22s
English
Paid

Course description

Learn how to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into production systems. This course is designed to kickstart your career and make you a sought-after data engineer of tomorrow.

Read more about the course

Why is Data Engineering the new major profession in IT?

Data Engineering is rapidly becoming one of the fastest-growing and in-demand professions in the tech world. With the rise of AI products, analytical systems, and real-time applications, companies are actively developing their data infrastructures, which drives the demand for specialists.

Just last year, more than 20,000 new data engineer positions were created, and the total number of open positions in North America approached 150,000, clearly demonstrating the explosive growth of the industry.

Moreover, the salaries are impressive:

  • Entry level - from $80,000 to $110,000 per year
  • Mid and senior level - up to $190,000–$200,000+

Furthermore, data engineers play a strategic role: they build the foundation for machine learning systems, analytics, and AI, without which modern tech products are impossible. With the further growth of AI, the demand for data engineers will only increase, creating excellent opportunities for a long-term career and financial stability.

Why this particular bootcamp?

Our bootcamp is designed to be as comprehensive and practical as possible, without unnecessary theory or outdated tutorials. You will learn step by step and build real projects using the same tools that professionals use.

You will start with Apache Spark, processing real Airbnb data and mastering large-scale computations. Then you will create a modern data lake on AWS using S3, EMR, Glue, and Athena. You will learn pipeline orchestration with Apache Airflow, build streaming systems on Kafka and Flink, and even integrate machine learning and LLM (Large Language Models) directly into the pipelines.

As a result, you will learn to build end-to-end production-level systems - the exact skills employers are looking for.

What's inside the course?

  • Introduction to Data Engineering
    • Understand how modern data engineering works and what is needed to start.
  • Big Data Processing with Apache Spark
    • Learn to work with large datasets using DataFrame API, UDF, aggregations, and optimization.
  • Building a data lake on AWS
    • Build scalable data storage using S3, EMR, and Athena.
  • Pipelines with Apache Airflow
    • Automate and manage tasks, handle errors, schedule and run Spark jobs.
  • ML with Spark MLlib
    • Embed machine learning in your pipelines - classification, regression, model selection.
  • AI and LLM in Data Engineering
    • Use Hugging Face and other tools to integrate LLM in data processing.
  • Stream Processing with Apache Kafka and Flink
    • Create real-time systems, process events, work with real-time streams.

Outcome

After completing the course, you won’t just have watched videos - you'll become a true data engineer, ready to build systems that companies need today.

Thousands of our graduates already work at Google, Tesla, Amazon, Apple, IBM, JP Morgan, Facebook, Shopify, and other top companies.

Many of them started from scratch. So why not become the next one?

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 183 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: The Data Engineering Bootcamp: Zero to Mastery

All Course Lessons (183)

#Lesson TitleDurationAccess
1
The Data Engineering Bootcamp: Zero to Mastery Demo
01:35
2
Introduction
11:47
3
Storing Data
10:57
4
Processing Data
07:08
5
Data Sources
10:23
6
Orchestration
06:24
7
Stream Processing
07:11
8
AI and ML with Data Engineering
08:14
9
Serving Data
06:58
10
Cloud and Data Engineering
07:25
11
Source Code for This Bootcamp
01:19
12
Prerequisites
02:54
13
What’s Next?
04:30
14
Introduction
05:00
15
Jupyter Notebooks
07:38
16
Python - Lists
06:34
17
Python - Tuples
03:37
18
Python - Dictionaries
07:05
19
Python - Sets
03:21
20
Python - Range
04:05
21
Python - Comprehensions
06:00
22
Python - Strings Formatting
04:43
23
Python - Functions
04:00
24
Python - Decorators
07:55
25
Python - Exceptions
07:20
26
Python - Classes - Part 1
12:14
27
Python - Classes - Part 2
08:29
28
Python - Iterators
07:50
29
CLI - Basic Commands
06:53
30
CLI - Combining Commands
05:36
31
CLI - Environment Variables
03:35
32
Virtual Environments - What Is a Virtualenv?
06:37
33
SQL - Introduction
03:30
34
SQL - Environment Set Up
04:31
35
SQL - Fetching Data
07:45
36
SQL - Grouping Rows
06:24
37
SQL - Joining Data
07:07
38
SQL - Creating Data
06:04
39
Introduction
04:08
40
Apache Spark
03:44
41
How Spark Works
04:24
42
Spark Application
07:41
43
DataFrames
06:43
44
Installing Spark
05:51
45
Inside Airbnb Data
07:02
46
Writing Your First Spark Job
07:05
47
Lazy Processing
02:16
48
[Exercise] Basic Functions
01:29
49
[Exercise] Basic Functions - Solution
06:41
50
Aggregating Data
04:00
51
Joining Data
04:40
52
Aggregations and Joins with Spark
06:10
53
Complex Data Types
05:09
54
[Exercise] Aggregate Functions
00:50
55
[Exercise] Aggregate Functions - Solution
05:54
56
User Defined Functions
03:25
57
Data Shuffle
06:14
58
Data Accumulators
03:42
59
Optimizing Spark Jobs
07:39
60
Submitting Spark Jobs
04:29
61
Other Spark APIs
05:16
62
Spark SQL
04:33
63
[Exercise] Advanced Spark
02:10
64
[Exercise] Advanced Spark - Solution
05:26
65
Summary
03:08
66
Introduction
04:26
67
What Is a Data Lake?
09:08
68
Amazon Web Services (AWS)
07:47
69
Simple Storage Service (S3)
05:45
70
Setting Up an AWS Account
09:29
71
Data Partitioning
03:24
72
Using S3
07:49
73
EMR Serverless
02:59
74
IAM Roles
02:52
75
Running a Spark Job
08:49
76
Parquet Data Format
07:41
77
Implementing a Data Catalog
05:32
78
Data Catalog Demo
06:42
79
Querying a Data Lake
04:00
80
Summary
03:39
81
Introduction
05:53
82
What Is Apache Airflow?
05:19
83
Airflow’s Architecture
03:15
84
Installing Airflow
06:33
85
Defining an Airflow DAG
08:03
86
Errors Handling
03:38
87
Idempotent Tasks
04:54
88
Creating a DAG - Part 1
04:58
89
Creating a DAG - Part 2
04:42
90
Handling Failed Tasks
04:09
91
[Exercise] Data Validation
04:31
92
[Exercise] Data Validation - Solution
03:27
93
Spark with Airflow
03:02
94
Using Spark with Airflow - Part 1
07:39
95
Using Spark with Airflow - Part 2
05:52
96
Sensors In Airflow
04:46
97
Using File Sensors
04:08
98
Data Ingestion
05:50
99
Reading Data From Postgres - Part 1
06:03
100
Reading Data from Postgres - Part 2
05:40
101
[Exercise] Average Customer Review
03:53
102
[Exercise] Average Customer Review - Solution
04:33
103
Advanced DAGs
04:26
104
Summary
02:27
105
Introduction
05:28
106
What Is Machine Learning
06:06
107
Regression Algorithms
05:38
108
Building a Regression Model
05:04
109
Training a Model
09:46
110
Model Evaluation
07:26
111
Testing a Regression Model
03:57
112
Model Lifecycle
02:12
113
Feature Engineering
08:44
114
Improving a Regression Model
07:34
115
Machine Learning Pipelines
03:56
116
Creating a Pipeline
02:41
117
[Exercise] House Price Estimation
01:59
118
[Exercise] House Price Estimation - Solution
03:12
119
[Exercise] Imposter Syndrome
02:57
120
Classification
07:37
121
Classifiers Evaluation
04:27
122
Training a Classifier
08:31
123
Hyperparameters
08:06
124
Optimizing a Model
03:02
125
[Exercise] Loan Approval
02:34
126
[Exercise] Load Approval - Solution
02:33
127
Deep Learning
06:56
128
Summary
03:23
129
Introduction
05:07
130
Natural Language Processing (NLP) before LLMs
06:11
131
Transformers
06:21
132
Types of LLMs
07:40
133
Hugging Face
02:19
134
Databricks Set Up
10:38
135
Using an LLM
07:36
136
Structured Output
03:42
137
Producing JSON Output
05:10
138
LLMs With Apache Spark
05:20
139
Summary
02:48
140
Introduction
06:06
141
What Is Apache Kafka?
07:00
142
Partitioning Data
08:56
143
Kafka API
07:42
144
Kafka Architecture
03:15
145
Set Up Kafka
05:53
146
Writing to Kafka
06:07
147
Reading from Kafka
07:37
148
Data Durability
06:39
149
Kafka vs Queues
02:11
150
[Exercise] Processing Records
03:44
151
[Exercise] Processing Records - Solution
02:59
152
Delivery Semantics
05:53
153
Kafka Transactions
04:34
154
Log Compaction
03:23
155
Kafka Connect
06:59
156
Using Kafka Connect
09:44
157
Outbox Pattern
04:31
158
Schema Registry
08:01
159
Using Schema Registry
08:10
160
Tiered Storage
03:28
161
[Exercise] Track Order Status Changes
04:27
162
[Exercise] Track Order Status Changes - Solution
05:06
163
Summary
04:41
164
Introduction
05:40
165
What Is Apache Flink?
05:24
166
Flink Applications
08:11
167
Multiple Streams
03:11
168
Installing Apache Flink
05:46
169
Processing Individual Records
07:22
170
[Exercise] Stream Processing
04:02
171
[Exercise] Stream Processing - Solution
02:40
172
Time Windows
06:49
173
Keyed Windows
02:40
174
Using Time Windows
05:18
175
Watermarks
10:06
176
Advanced Window Operations
06:17
177
Stateful Stream Processing
07:50
178
Using Local State
04:42
179
[Exercise] Anomalies Detection
04:35
180
[Exercise] Anomalies Detection - Solution
03:34
181
Joining Streams
05:50
182
Summary
03:10
183
Thank You!
01:18

Unlock unlimited learning

Get instant access to all 182 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Sources: Sebastian Raschka
"Creating a Large Language Model from Scratch" is a practical guide that will teach you step by step how to create, train, and fine-tune large language models..
Apache Kafka Fundamentals

Apache Kafka Fundamentals

Sources: Andreas Kretz
In this course, you will acquire the basic knowledge necessary for confidently starting to work with Apache Kafka. You will learn how to set up a message...
1 hour 4 minutes 52 seconds
Data Structures and Algorithmic Trading: Machine Learning

Data Structures and Algorithmic Trading: Machine Learning

Sources: udemy
Data Structures and Algorithmic trading is a method of executing orders using automated pre-programmed trading instructions over time. They were developed so th
2 hours 20 minutes 32 seconds
Data Analysis for Beginners: Excel & Pivot Tables

Data Analysis for Beginners: Excel & Pivot Tables

Sources: zerotomastery.io
This short course on data analysis in Excel is perfect for beginners who want to acquire skills in analyzing structured data using two of Excel's most...
2 hours 10 minutes 21 seconds
Dockerized ETL With AWS, TDengine & Grafana

Dockerized ETL With AWS, TDengine & Grafana

Sources: Andreas Kretz
Data engineers often need to quickly set up a simple ETL script that just does its job. In this project, you will learn how to easily implement...
29 minutes 12 seconds