The Data Engineering Bootcamp: Zero to Mastery

13h 23m 15s
English
Paid

Course description

Learn how to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into production systems. This course is designed to kickstart your career and make you a sought-after data engineer of tomorrow.

Read more about the course

Why is Data Engineering the new major profession in IT?

Data Engineering is rapidly becoming one of the fastest-growing and in-demand professions in the tech world. With the rise of AI products, analytical systems, and real-time applications, companies are actively developing their data infrastructures, which drives the demand for specialists.

Just last year, more than 20,000 new data engineer positions were created, and the total number of open positions in North America approached 150,000, clearly demonstrating the explosive growth of the industry.

Moreover, the salaries are impressive:

  • Entry level - from $80,000 to $110,000 per year
  • Mid and senior level - up to $190,000–$200,000+

Furthermore, data engineers play a strategic role: they build the foundation for machine learning systems, analytics, and AI, without which modern tech products are impossible. With the further growth of AI, the demand for data engineers will only increase, creating excellent opportunities for a long-term career and financial stability.

Why this particular bootcamp?

Our bootcamp is designed to be as comprehensive and practical as possible, without unnecessary theory or outdated tutorials. You will learn step by step and build real projects using the same tools that professionals use.

You will start with Apache Spark, processing real Airbnb data and mastering large-scale computations. Then you will create a modern data lake on AWS using S3, EMR, Glue, and Athena. You will learn pipeline orchestration with Apache Airflow, build streaming systems on Kafka and Flink, and even integrate machine learning and LLM (Large Language Models) directly into the pipelines.

As a result, you will learn to build end-to-end production-level systems - the exact skills employers are looking for.

What's inside the course?

  • Introduction to Data Engineering
    • Understand how modern data engineering works and what is needed to start.
  • Big Data Processing with Apache Spark
    • Learn to work with large datasets using DataFrame API, UDF, aggregations, and optimization.
  • Building a data lake on AWS
    • Build scalable data storage using S3, EMR, and Athena.
  • Pipelines with Apache Airflow
    • Automate and manage tasks, handle errors, schedule and run Spark jobs.
  • ML with Spark MLlib
    • Embed machine learning in your pipelines - classification, regression, model selection.
  • AI and LLM in Data Engineering
    • Use Hugging Face and other tools to integrate LLM in data processing.
  • Stream Processing with Apache Kafka and Flink
    • Create real-time systems, process events, work with real-time streams.

Outcome

After completing the course, you won’t just have watched videos - you'll become a true data engineer, ready to build systems that companies need today.

Thousands of our graduates already work at Google, Tesla, Amazon, Apple, IBM, JP Morgan, Facebook, Shopify, and other top companies.

Many of them started from scratch. So why not become the next one?

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 153 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing

Watch Online The Data Engineering Bootcamp: Zero to Mastery

0:00
/
#1: The Data Engineering Bootcamp: Zero to Mastery

All Course Lessons (153)

#Lesson TitleDurationAccess
1
The Data Engineering Bootcamp: Zero to Mastery Demo
01:35
2
Introduction to Data Engineering
04:17
3
Who Are Data Engineers?
04:43
4
Prerequisites
03:19
5
Source Code for This Bootcamp
01:19
6
Plan for This Bootcamp
04:38
7
[Optional] What Is a Virtualenv?
06:37
8
[Optional] What Is Docker?
11:03
9
Introduction
04:08
10
Apache Spark
03:44
11
How Spark Works
04:24
12
Spark Application
07:41
13
DataFrames
06:43
14
Installing Spark
05:51
15
Inside Airbnb Data
07:02
16
Writing Your First Spark Job
07:05
17
Lazy Processing
02:16
18
[Exercise] Basic Functions
01:29
19
[Exercise] Basic Functions - Solution
06:41
20
Aggregating Data
04:00
21
Joining Data
04:40
22
Aggregations and Joins with Spark
06:10
23
Complex Data Types
05:09
24
[Exercise] Aggregate Functions
00:50
25
[Exercise] Aggregate Functions - Solution
05:54
26
User Defined Functions
03:25
27
Data Shuffle
06:14
28
Data Accumulators
03:42
29
Optimizing Spark Jobs
07:39
30
Submitting Spark Jobs
04:29
31
Other Spark APIs
05:16
32
Spark SQL
04:33
33
[Exercise] Advanced Spark
02:10
34
[Exercise] Advanced Spark - Solution
05:26
35
Summary
03:08
36
Introduction
04:26
37
What Is a Data Lake?
09:08
38
Amazon Web Services (AWS)
07:47
39
Simple Storage Service (S3)
05:45
40
Setting Up an AWS Account
09:29
41
Data Partitioning
03:24
42
Using S3
07:49
43
EMR Serverless
02:59
44
IAM Roles
02:52
45
Running a Spark Job
08:49
46
Parquet Data Format
07:41
47
Implementing a Data Catalog
05:32
48
Data Catalog Demo
06:42
49
Querying a Data Lake
04:00
50
Summary
03:39
51
Introduction
05:53
52
What Is Apache Airflow?
05:19
53
Airflow’s Architecture
03:15
54
Installing Airflow
06:33
55
Defining an Airflow DAG
08:03
56
Errors Handling
03:38
57
Idempotent Tasks
04:54
58
Creating a DAG - Part 1
04:58
59
Creating a DAG - Part 2
04:42
60
Handling Failed Tasks
04:09
61
[Exercise] Data Validation
04:31
62
[Exercise] Data Validation - Solution
03:27
63
Spark with Airflow
03:02
64
Using Spark with Airflow - Part 1
07:39
65
Using Spark with Airflow - Part 2
05:52
66
Sensors In Airflow
04:46
67
Using File Sensors
04:08
68
Data Ingestion
05:50
69
Reading Data From Postgres - Part 1
06:03
70
Reading Data from Postgres - Part 2
05:40
71
[Exercise] Average Customer Review
03:53
72
[Exercise] Average Customer Review - Solution
04:33
73
Advanced DAGs
04:26
74
Summary
02:27
75
Introduction
05:28
76
What Is Machine Learning
06:06
77
Regression Algorithms
05:38
78
Building a Regression Model
05:04
79
Training a Model
09:46
80
Model Evaluation
07:26
81
Testing a Regression Model
03:57
82
Model Lifecycle
02:12
83
Feature Engineering
08:44
84
Improving a Regression Model
07:34
85
Machine Learning Pipelines
03:56
86
Creating a Pipeline
02:41
87
[Exercise] House Price Estimation
01:59
88
[Exercise] House Price Estimation - Solution
03:12
89
[Exercise] Imposter Syndrome
02:57
90
Classification
07:37
91
Classifiers Evaluation
04:27
92
Training a Classifier
08:31
93
Hyperparameters
08:06
94
Optimizing a Model
03:02
95
[Exercise] Loan Approval
02:34
96
[Exercise] Load Approval - Solution
02:33
97
Deep Learning
06:56
98
Summary
03:23
99
Introduction
05:07
100
Natural Language Processing (NLP) before LLMs
06:11
101
Transformers
06:21
102
Types of LLMs
07:40
103
Hugging Face
02:19
104
Databricks Set Up
10:38
105
Using an LLM
07:36
106
Structured Output
03:42
107
Producing JSON Output
05:10
108
LLMs With Apache Spark
05:20
109
Summary
02:48
110
Introduction
06:06
111
What Is Apache Kafka?
07:00
112
Partitioning Data
08:56
113
Kafka API
07:42
114
Kafka Architecture
03:15
115
Set Up Kafka
05:53
116
Writing to Kafka
06:07
117
Reading from Kafka
07:37
118
Data Durability
06:39
119
Kafka vs Queues
02:11
120
[Exercise] Processing Records
03:44
121
[Exercise] Processing Records - Solution
02:59
122
Delivery Semantics
05:53
123
Kafka Transactions
04:34
124
Log Compaction
03:23
125
Kafka Connect
06:59
126
Using Kafka Connect
09:44
127
Outbox Pattern
04:31
128
Schema Registry
08:01
129
Using Schema Registry
08:10
130
Tiered Storage
03:28
131
[Exercise] Track Order Status Changes
04:27
132
[Exercise] Track Order Status Changes - Solution
05:06
133
Summary
04:41
134
Introduction
05:40
135
What Is Apache Flink?
05:24
136
Kafka Application
08:11
137
Multiple Streams
03:11
138
Installing Apache Flink
05:46
139
Processing Individual Records
07:22
140
[Exercise] Stream Processing
04:02
141
[Exercise] Stream Processing - Solution
02:40
142
Time Windows
06:49
143
Keyed Windows
02:40
144
Using Time Windows
05:18
145
Watermarks
10:06
146
Advanced Window Operations
06:17
147
Stateful Stream Processing
07:50
148
Using Local State
04:42
149
[Exercise] Anomalies Detection
04:35
150
[Exercise] Anomalies Detection - Solution
03:34
151
Joining Streams
05:50
152
Summary
03:10
153
Thank You!
01:18

Unlock unlimited learning

Get instant access to all 152 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Spark and Python for Big Data with PySpark

Spark and Python for Big Data with PySpark

Sources: udemy
Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python! One of the most valuable technolog
10 hours 35 minutes 43 seconds
Data Platform & Pipeline Design

Data Platform & Pipeline Design

Sources: Andreas Kretz
Data pipelines are a key component of any Data Science platform. Without them, data loading and machine learning model deployment would not be possible. This...
1 hour 59 minutes 5 seconds
Data Analysis for Beginners: Excel & Pivot Tables

Data Analysis for Beginners: Excel & Pivot Tables

Sources: zerotomastery.io
This short course on data analysis in Excel is perfect for beginners who want to acquire skills in analyzing structured data using two of Excel's most...
2 hours 10 minutes 21 seconds
Machine Learning with Javascript

Machine Learning with Javascript

Sources: udemy, Stephen Grider
If you're here, you already know the truth: Machine Learning is the future of everything. In the coming years, there won't be a single industry in the world untouched by Machine...
17 hours 42 minutes 20 seconds
Machine Learning with Python : COMPLETE COURSE FOR BEGINNERS

Machine Learning with Python : COMPLETE COURSE FOR BEGINNERS

Sources: udemy
Machine Learning and artificial intelligence (AI) is everywhere; if you want to know how companies like Google, Amazon, and even Udemy extract meaning and insig
13 hours 12 minutes 31 seconds