Skip to main content

The Data Engineering Bootcamp: Zero to Mastery

16h 46m 22s
English
Paid

Course description

Learn how to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into production systems. This course is designed to kickstart your career and make you a sought-after data engineer of tomorrow.

Read more about the course

Why is Data Engineering the new major profession in IT?

Data Engineering is rapidly becoming one of the fastest-growing and in-demand professions in the tech world. With the rise of AI products, analytical systems, and real-time applications, companies are actively developing their data infrastructures, which drives the demand for specialists.

Just last year, more than 20,000 new data engineer positions were created, and the total number of open positions in North America approached 150,000, clearly demonstrating the explosive growth of the industry.

Moreover, the salaries are impressive:

  • Entry level - from $80,000 to $110,000 per year
  • Mid and senior level - up to $190,000–$200,000+

Furthermore, data engineers play a strategic role: they build the foundation for machine learning systems, analytics, and AI, without which modern tech products are impossible. With the further growth of AI, the demand for data engineers will only increase, creating excellent opportunities for a long-term career and financial stability.

Why this particular bootcamp?

Our bootcamp is designed to be as comprehensive and practical as possible, without unnecessary theory or outdated tutorials. You will learn step by step and build real projects using the same tools that professionals use.

You will start with Apache Spark, processing real Airbnb data and mastering large-scale computations. Then you will create a modern data lake on AWS using S3, EMR, Glue, and Athena. You will learn pipeline orchestration with Apache Airflow, build streaming systems on Kafka and Flink, and even integrate machine learning and LLM (Large Language Models) directly into the pipelines.

As a result, you will learn to build end-to-end production-level systems - the exact skills employers are looking for.

What's inside the course?

  • Introduction to Data Engineering
    • Understand how modern data engineering works and what is needed to start.
  • Big Data Processing with Apache Spark
    • Learn to work with large datasets using DataFrame API, UDF, aggregations, and optimization.
  • Building a data lake on AWS
    • Build scalable data storage using S3, EMR, and Athena.
  • Pipelines with Apache Airflow
    • Automate and manage tasks, handle errors, schedule and run Spark jobs.
  • ML with Spark MLlib
    • Embed machine learning in your pipelines - classification, regression, model selection.
  • AI and LLM in Data Engineering
    • Use Hugging Face and other tools to integrate LLM in data processing.
  • Stream Processing with Apache Kafka and Flink
    • Create real-time systems, process events, work with real-time streams.

Outcome

After completing the course, you won’t just have watched videos - you'll become a true data engineer, ready to build systems that companies need today.

Thousands of our graduates already work at Google, Tesla, Amazon, Apple, IBM, JP Morgan, Facebook, Shopify, and other top companies.

Many of them started from scratch. So why not become the next one?

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 183 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: The Data Engineering Bootcamp: Zero to Mastery

All Course Lessons (183)

#Lesson TitleDurationAccess
1
The Data Engineering Bootcamp: Zero to Mastery Demo
01:35
2
Introduction
11:47
3
Storing Data
10:57
4
Processing Data
07:08
5
Data Sources
10:23
6
Orchestration
06:24
7
Stream Processing
07:11
8
AI and ML with Data Engineering
08:14
9
Serving Data
06:58
10
Cloud and Data Engineering
07:25
11
Source Code for This Bootcamp
01:19
12
Prerequisites
02:54
13
What’s Next?
04:30
14
Introduction
05:00
15
Jupyter Notebooks
07:38
16
Python - Lists
06:34
17
Python - Tuples
03:37
18
Python - Dictionaries
07:05
19
Python - Sets
03:21
20
Python - Range
04:05
21
Python - Comprehensions
06:00
22
Python - Strings Formatting
04:43
23
Python - Functions
04:00
24
Python - Decorators
07:55
25
Python - Exceptions
07:20
26
Python - Classes - Part 1
12:14
27
Python - Classes - Part 2
08:29
28
Python - Iterators
07:50
29
CLI - Basic Commands
06:53
30
CLI - Combining Commands
05:36
31
CLI - Environment Variables
03:35
32
Virtual Environments - What Is a Virtualenv?
06:37
33
SQL - Introduction
03:30
34
SQL - Environment Set Up
04:31
35
SQL - Fetching Data
07:45
36
SQL - Grouping Rows
06:24
37
SQL - Joining Data
07:07
38
SQL - Creating Data
06:04
39
Introduction
04:08
40
Apache Spark
03:44
41
How Spark Works
04:24
42
Spark Application
07:41
43
DataFrames
06:43
44
Installing Spark
05:51
45
Inside Airbnb Data
07:02
46
Writing Your First Spark Job
07:05
47
Lazy Processing
02:16
48
[Exercise] Basic Functions
01:29
49
[Exercise] Basic Functions - Solution
06:41
50
Aggregating Data
04:00
51
Joining Data
04:40
52
Aggregations and Joins with Spark
06:10
53
Complex Data Types
05:09
54
[Exercise] Aggregate Functions
00:50
55
[Exercise] Aggregate Functions - Solution
05:54
56
User Defined Functions
03:25
57
Data Shuffle
06:14
58
Data Accumulators
03:42
59
Optimizing Spark Jobs
07:39
60
Submitting Spark Jobs
04:29
61
Other Spark APIs
05:16
62
Spark SQL
04:33
63
[Exercise] Advanced Spark
02:10
64
[Exercise] Advanced Spark - Solution
05:26
65
Summary
03:08
66
Introduction
04:26
67
What Is a Data Lake?
09:08
68
Amazon Web Services (AWS)
07:47
69
Simple Storage Service (S3)
05:45
70
Setting Up an AWS Account
09:29
71
Data Partitioning
03:24
72
Using S3
07:49
73
EMR Serverless
02:59
74
IAM Roles
02:52
75
Running a Spark Job
08:49
76
Parquet Data Format
07:41
77
Implementing a Data Catalog
05:32
78
Data Catalog Demo
06:42
79
Querying a Data Lake
04:00
80
Summary
03:39
81
Introduction
05:53
82
What Is Apache Airflow?
05:19
83
Airflow’s Architecture
03:15
84
Installing Airflow
06:33
85
Defining an Airflow DAG
08:03
86
Errors Handling
03:38
87
Idempotent Tasks
04:54
88
Creating a DAG - Part 1
04:58
89
Creating a DAG - Part 2
04:42
90
Handling Failed Tasks
04:09
91
[Exercise] Data Validation
04:31
92
[Exercise] Data Validation - Solution
03:27
93
Spark with Airflow
03:02
94
Using Spark with Airflow - Part 1
07:39
95
Using Spark with Airflow - Part 2
05:52
96
Sensors In Airflow
04:46
97
Using File Sensors
04:08
98
Data Ingestion
05:50
99
Reading Data From Postgres - Part 1
06:03
100
Reading Data from Postgres - Part 2
05:40
101
[Exercise] Average Customer Review
03:53
102
[Exercise] Average Customer Review - Solution
04:33
103
Advanced DAGs
04:26
104
Summary
02:27
105
Introduction
05:28
106
What Is Machine Learning
06:06
107
Regression Algorithms
05:38
108
Building a Regression Model
05:04
109
Training a Model
09:46
110
Model Evaluation
07:26
111
Testing a Regression Model
03:57
112
Model Lifecycle
02:12
113
Feature Engineering
08:44
114
Improving a Regression Model
07:34
115
Machine Learning Pipelines
03:56
116
Creating a Pipeline
02:41
117
[Exercise] House Price Estimation
01:59
118
[Exercise] House Price Estimation - Solution
03:12
119
[Exercise] Imposter Syndrome
02:57
120
Classification
07:37
121
Classifiers Evaluation
04:27
122
Training a Classifier
08:31
123
Hyperparameters
08:06
124
Optimizing a Model
03:02
125
[Exercise] Loan Approval
02:34
126
[Exercise] Load Approval - Solution
02:33
127
Deep Learning
06:56
128
Summary
03:23
129
Introduction
05:07
130
Natural Language Processing (NLP) before LLMs
06:11
131
Transformers
06:21
132
Types of LLMs
07:40
133
Hugging Face
02:19
134
Databricks Set Up
10:38
135
Using an LLM
07:36
136
Structured Output
03:42
137
Producing JSON Output
05:10
138
LLMs With Apache Spark
05:20
139
Summary
02:48
140
Introduction
06:06
141
What Is Apache Kafka?
07:00
142
Partitioning Data
08:56
143
Kafka API
07:42
144
Kafka Architecture
03:15
145
Set Up Kafka
05:53
146
Writing to Kafka
06:07
147
Reading from Kafka
07:37
148
Data Durability
06:39
149
Kafka vs Queues
02:11
150
[Exercise] Processing Records
03:44
151
[Exercise] Processing Records - Solution
02:59
152
Delivery Semantics
05:53
153
Kafka Transactions
04:34
154
Log Compaction
03:23
155
Kafka Connect
06:59
156
Using Kafka Connect
09:44
157
Outbox Pattern
04:31
158
Schema Registry
08:01
159
Using Schema Registry
08:10
160
Tiered Storage
03:28
161
[Exercise] Track Order Status Changes
04:27
162
[Exercise] Track Order Status Changes - Solution
05:06
163
Summary
04:41
164
Introduction
05:40
165
What Is Apache Flink?
05:24
166
Flink Applications
08:11
167
Multiple Streams
03:11
168
Installing Apache Flink
05:46
169
Processing Individual Records
07:22
170
[Exercise] Stream Processing
04:02
171
[Exercise] Stream Processing - Solution
02:40
172
Time Windows
06:49
173
Keyed Windows
02:40
174
Using Time Windows
05:18
175
Watermarks
10:06
176
Advanced Window Operations
06:17
177
Stateful Stream Processing
07:50
178
Using Local State
04:42
179
[Exercise] Anomalies Detection
04:35
180
[Exercise] Anomalies Detection - Solution
03:34
181
Joining Streams
05:50
182
Summary
03:10
183
Thank You!
01:18

Unlock unlimited learning

Get instant access to all 182 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Schema Design Data Stores

Schema Design Data Stores

Sources: Andreas Kretz
During my coaching sessions, one important topic repeatedly comes up - designing diagrams. Therefore, I decided to create a separate course in the academy to...
2 hours 30 minutes 25 seconds
Machine Learning: Natural Language Processing in Python (V2)

Machine Learning: Natural Language Processing in Python (V2)

Sources: udemy
Welcome to Machine Learning: Natural Language Processing in Python (Version 2). NLP: Use Markov Models, NLTK, Artificial Intelligence, Deep Learning, Machine Le
22 hours 4 minutes 2 seconds
Machine Learning & Containers on AWS

Machine Learning & Containers on AWS

Sources: Andreas Kretz
In this practical course, you will learn how to build a complete data pipeline on the AWS platform - from obtaining data from the Twitter API to analysis, stora
1 hour 33 minutes 34 seconds
2022 Python for Machine Learning & Data Science Masterclass

2022 Python for Machine Learning & Data Science Masterclass

Sources: udemy
Welcome to the most complete course on learning Data Science and Machine Learning on the internet! After teaching over 2 million students I've worked for over a
44 hours 5 minutes 31 seconds
Time Series Analysis, Forecasting, and Machine Learning

Time Series Analysis, Forecasting, and Machine Learning

Sources: udemy
Let me cut to the chase. This is not your average Time Series Analysis course. This course covers modern developments such as deep learning, time series classif
22 hours 47 minutes 45 seconds