Skip to main content

The Data Engineering Bootcamp: Zero to Mastery

16h 46m 22s
English
Paid

Embark on a journey to become a data engineer with our bootcamp, where you will learn to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into production systems. This course is designed to kickstart your career and turn you into a highly sought-after data engineer of tomorrow.

Why is Data Engineering the New Major Profession in IT?

Data Engineering is rapidly emerging as one of the most in-demand professions in the tech industry. With the surge in AI products, analytical systems, and real-time applications, companies are keen on developing their data infrastructures, thereby fueling the demand for skilled specialists.

Last year alone, over 20,000 new data engineer positions were created, with open positions in North America nearing 150,000, showcasing the industry's explosive growth.

Impressive Salaries

  • Entry-level: $80,000 to $110,000 per year
  • Mid and senior level: up to $190,000–$200,000+

Data engineers play a pivotal role in building the foundation for machine learning systems, analytics, and AI. As AI continues to grow, the demand for data engineers will only increase, offering outstanding long-term career and financial opportunities.

Why Choose This Bootcamp?

Our bootcamp offers a comprehensive, practical approach without unnecessary theory or outdated tutorials. You will learn step-by-step through building real projects using industry-standard tools.

The course begins with Apache Spark, processing real Airbnb data and mastering large-scale computations. You will create a modern data lake on AWS with S3, EMR, Glue, and Athena, and learn pipeline orchestration with Apache Airflow. Additionally, you will build streaming systems on Kafka and Flink and integrate machine learning and LLM (Large Language Models) into the pipelines.

This approach equips you to build end-to-end production-level systems, the exact skills employers demand.

Course Content Breakdown

  • Introduction to Data Engineering
    • Understand modern data engineering fundamentals and what's needed to start.
  • Big Data Processing with Apache Spark
    • Handle large datasets using DataFrame API, UDF, aggregations, and optimization.
  • Building a Data Lake on AWS
    • Create scalable data storage solutions using S3, EMR, and Athena.
  • Pipelines with Apache Airflow
    • Automate and manage tasks, handle errors, and schedule Spark jobs.
  • Machine Learning with Spark MLlib
    • Integrate machine learning into pipelines with classification, regression, and model selection.
  • AI and LLM in Data Engineering
    • Leverage tools like Hugging Face to incorporate LLM into data processing.
  • Stream Processing with Apache Kafka and Flink
    • Develop real-time systems, event processing, and manage real-time streams.

Course Outcome

Upon course completion, you won't have only watched videos—you'll transform into a proficient data engineer ready to build essential systems for today's companies.

Thousands of graduates now work at leading firms such as Google, Tesla, Amazon, Apple, IBM, JP Morgan, Facebook, Shopify, and more. Many started from scratch. Are you ready to be the next success story?

About the Author: zerotomastery.io

zerotomastery.io thumbnail
Whether you are just starting to learn to code or want to advance your skills, Zero To Mastery Academy will teach you React, Javascript, Python, CSS and more to help you advance your career, get hired and succeed at some of the top companies in the world.

Watch Online 183 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 183 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: The Data Engineering Bootcamp: Zero to Mastery
All Course Lessons (183)
#Lesson TitleDurationAccess
1
The Data Engineering Bootcamp: Zero to Mastery Demo
01:35
2
Introduction
11:47
3
Storing Data
10:57
4
Processing Data
07:08
5
Data Sources
10:23
6
Orchestration
06:24
7
Stream Processing
07:11
8
AI and ML with Data Engineering
08:14
9
Serving Data
06:58
10
Cloud and Data Engineering
07:25
11
Source Code for This Bootcamp
01:19
12
Prerequisites
02:54
13
What’s Next?
04:30
14
Introduction
05:00
15
Jupyter Notebooks
07:38
16
Python - Lists
06:34
17
Python - Tuples
03:37
18
Python - Dictionaries
07:05
19
Python - Sets
03:21
20
Python - Range
04:05
21
Python - Comprehensions
06:00
22
Python - Strings Formatting
04:43
23
Python - Functions
04:00
24
Python - Decorators
07:55
25
Python - Exceptions
07:20
26
Python - Classes - Part 1
12:14
27
Python - Classes - Part 2
08:29
28
Python - Iterators
07:50
29
CLI - Basic Commands
06:53
30
CLI - Combining Commands
05:36
31
CLI - Environment Variables
03:35
32
Virtual Environments - What Is a Virtualenv?
06:37
33
SQL - Introduction
03:30
34
SQL - Environment Set Up
04:31
35
SQL - Fetching Data
07:45
36
SQL - Grouping Rows
06:24
37
SQL - Joining Data
07:07
38
SQL - Creating Data
06:04
39
Introduction
04:08
40
Apache Spark
03:44
41
How Spark Works
04:24
42
Spark Application
07:41
43
DataFrames
06:43
44
Installing Spark
05:51
45
Inside Airbnb Data
07:02
46
Writing Your First Spark Job
07:05
47
Lazy Processing
02:16
48
[Exercise] Basic Functions
01:29
49
[Exercise] Basic Functions - Solution
06:41
50
Aggregating Data
04:00
51
Joining Data
04:40
52
Aggregations and Joins with Spark
06:10
53
Complex Data Types
05:09
54
[Exercise] Aggregate Functions
00:50
55
[Exercise] Aggregate Functions - Solution
05:54
56
User Defined Functions
03:25
57
Data Shuffle
06:14
58
Data Accumulators
03:42
59
Optimizing Spark Jobs
07:39
60
Submitting Spark Jobs
04:29
61
Other Spark APIs
05:16
62
Spark SQL
04:33
63
[Exercise] Advanced Spark
02:10
64
[Exercise] Advanced Spark - Solution
05:26
65
Summary
03:08
66
Introduction
04:26
67
What Is a Data Lake?
09:08
68
Amazon Web Services (AWS)
07:47
69
Simple Storage Service (S3)
05:45
70
Setting Up an AWS Account
09:29
71
Data Partitioning
03:24
72
Using S3
07:49
73
EMR Serverless
02:59
74
IAM Roles
02:52
75
Running a Spark Job
08:49
76
Parquet Data Format
07:41
77
Implementing a Data Catalog
05:32
78
Data Catalog Demo
06:42
79
Querying a Data Lake
04:00
80
Summary
03:39
81
Introduction
05:53
82
What Is Apache Airflow?
05:19
83
Airflow’s Architecture
03:15
84
Installing Airflow
06:33
85
Defining an Airflow DAG
08:03
86
Errors Handling
03:38
87
Idempotent Tasks
04:54
88
Creating a DAG - Part 1
04:58
89
Creating a DAG - Part 2
04:42
90
Handling Failed Tasks
04:09
91
[Exercise] Data Validation
04:31
92
[Exercise] Data Validation - Solution
03:27
93
Spark with Airflow
03:02
94
Using Spark with Airflow - Part 1
07:39
95
Using Spark with Airflow - Part 2
05:52
96
Sensors In Airflow
04:46
97
Using File Sensors
04:08
98
Data Ingestion
05:50
99
Reading Data From Postgres - Part 1
06:03
100
Reading Data from Postgres - Part 2
05:40
101
[Exercise] Average Customer Review
03:53
102
[Exercise] Average Customer Review - Solution
04:33
103
Advanced DAGs
04:26
104
Summary
02:27
105
Introduction
05:28
106
What Is Machine Learning
06:06
107
Regression Algorithms
05:38
108
Building a Regression Model
05:04
109
Training a Model
09:46
110
Model Evaluation
07:26
111
Testing a Regression Model
03:57
112
Model Lifecycle
02:12
113
Feature Engineering
08:44
114
Improving a Regression Model
07:34
115
Machine Learning Pipelines
03:56
116
Creating a Pipeline
02:41
117
[Exercise] House Price Estimation
01:59
118
[Exercise] House Price Estimation - Solution
03:12
119
[Exercise] Imposter Syndrome
02:57
120
Classification
07:37
121
Classifiers Evaluation
04:27
122
Training a Classifier
08:31
123
Hyperparameters
08:06
124
Optimizing a Model
03:02
125
[Exercise] Loan Approval
02:34
126
[Exercise] Load Approval - Solution
02:33
127
Deep Learning
06:56
128
Summary
03:23
129
Introduction
05:07
130
Natural Language Processing (NLP) before LLMs
06:11
131
Transformers
06:21
132
Types of LLMs
07:40
133
Hugging Face
02:19
134
Databricks Set Up
10:38
135
Using an LLM
07:36
136
Structured Output
03:42
137
Producing JSON Output
05:10
138
LLMs With Apache Spark
05:20
139
Summary
02:48
140
Introduction
06:06
141
What Is Apache Kafka?
07:00
142
Partitioning Data
08:56
143
Kafka API
07:42
144
Kafka Architecture
03:15
145
Set Up Kafka
05:53
146
Writing to Kafka
06:07
147
Reading from Kafka
07:37
148
Data Durability
06:39
149
Kafka vs Queues
02:11
150
[Exercise] Processing Records
03:44
151
[Exercise] Processing Records - Solution
02:59
152
Delivery Semantics
05:53
153
Kafka Transactions
04:34
154
Log Compaction
03:23
155
Kafka Connect
06:59
156
Using Kafka Connect
09:44
157
Outbox Pattern
04:31
158
Schema Registry
08:01
159
Using Schema Registry
08:10
160
Tiered Storage
03:28
161
[Exercise] Track Order Status Changes
04:27
162
[Exercise] Track Order Status Changes - Solution
05:06
163
Summary
04:41
164
Introduction
05:40
165
What Is Apache Flink?
05:24
166
Flink Applications
08:11
167
Multiple Streams
03:11
168
Installing Apache Flink
05:46
169
Processing Individual Records
07:22
170
[Exercise] Stream Processing
04:02
171
[Exercise] Stream Processing - Solution
02:40
172
Time Windows
06:49
173
Keyed Windows
02:40
174
Using Time Windows
05:18
175
Watermarks
10:06
176
Advanced Window Operations
06:17
177
Stateful Stream Processing
07:50
178
Using Local State
04:42
179
[Exercise] Anomalies Detection
04:35
180
[Exercise] Anomalies Detection - Solution
03:34
181
Joining Streams
05:50
182
Summary
03:10
183
Thank You!
01:18
Unlock unlimited learning

Get instant access to all 182 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription