The Data Engineering Bootcamp: Zero to Mastery

Name: The Data Engineering Bootcamp: Zero to Mastery
Price: 9 USD
Availability: InStock

13h 23m 15s

English

Paid

August 14, 2025

Course description

Learn how to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into production systems. This course is designed to kickstart your career and make you a sought-after data engineer of tomorrow.

Watch Online

0:00

#1: The Data Engineering Bootcamp: Zero to Mastery

All Course Lessons (153)

#	Lesson Title	Duration
1	The Data Engineering Bootcamp: Zero to Mastery Demo	01:35
2	Introduction to Data Engineering	04:17
3	Who Are Data Engineers?	04:43
4	Prerequisites	03:19
5	Source Code for This Bootcamp	01:19
6	Plan for This Bootcamp	04:38
7	[Optional] What Is a Virtualenv?	06:37
8	[Optional] What Is Docker?	11:03
9	Introduction	04:08
10	Apache Spark	03:44
11	How Spark Works	04:24
12	Spark Application	07:41
13	DataFrames	06:43
14	Installing Spark	05:51
15	Inside Airbnb Data	07:02
16	Writing Your First Spark Job	07:05
17	Lazy Processing	02:16
18	[Exercise] Basic Functions	01:29
19	[Exercise] Basic Functions - Solution	06:41
20	Aggregating Data	04:00
21	Joining Data	04:40
22	Aggregations and Joins with Spark	06:10
23	Complex Data Types	05:09
24	[Exercise] Aggregate Functions	00:50
25	[Exercise] Aggregate Functions - Solution	05:54
26	User Defined Functions	03:25
27	Data Shuffle	06:14
28	Data Accumulators	03:42
29	Optimizing Spark Jobs	07:39
30	Submitting Spark Jobs	04:29
31	Other Spark APIs	05:16
32	Spark SQL	04:33
33	[Exercise] Advanced Spark	02:10
34	[Exercise] Advanced Spark - Solution	05:26
35	Summary	03:08
36	Introduction	04:26
37	What Is a Data Lake?	09:08
38	Amazon Web Services (AWS)	07:47
39	Simple Storage Service (S3)	05:45
40	Setting Up an AWS Account	09:29
41	Data Partitioning	03:24
42	Using S3	07:49
43	EMR Serverless	02:59
44	IAM Roles	02:52
45	Running a Spark Job	08:49
46	Parquet Data Format	07:41
47	Implementing a Data Catalog	05:32
48	Data Catalog Demo	06:42
49	Querying a Data Lake	04:00
50	Summary	03:39
51	Introduction	05:53
52	What Is Apache Airflow?	05:19
53	AirflowвЂ™s Architecture	03:15
54	Installing Airflow	06:33
55	Defining an Airflow DAG	08:03
56	Errors Handling	03:38
57	Idempotent Tasks	04:54
58	Creating a DAG - Part 1	04:58
59	Creating a DAG - Part 2	04:42
60	Handling Failed Tasks	04:09
61	[Exercise] Data Validation	04:31
62	[Exercise] Data Validation - Solution	03:27
63	Spark with Airflow	03:02
64	Using Spark with Airflow - Part 1	07:39
65	Using Spark with Airflow - Part 2	05:52
66	Sensors In Airflow	04:46
67	Using File Sensors	04:08
68	Data Ingestion	05:50
69	Reading Data From Postgres - Part 1	06:03
70	Reading Data from Postgres - Part 2	05:40
71	[Exercise] Average Customer Review	03:53
72	[Exercise] Average Customer Review - Solution	04:33
73	Advanced DAGs	04:26
74	Summary	02:27
75	Introduction	05:28
76	What Is Machine Learning	06:06
77	Regression Algorithms	05:38
78	Building a Regression Model	05:04
79	Training a Model	09:46
80	Model Evaluation	07:26
81	Testing a Regression Model	03:57
82	Model Lifecycle	02:12
83	Feature Engineering	08:44
84	Improving a Regression Model	07:34
85	Machine Learning Pipelines	03:56
86	Creating a Pipeline	02:41
87	[Exercise] House Price Estimation	01:59
88	[Exercise] House Price Estimation - Solution	03:12
89	[Exercise] Imposter Syndrome	02:57
90	Classification	07:37
91	Classifiers Evaluation	04:27
92	Training a Classifier	08:31
93	Hyperparameters	08:06
94	Optimizing a Model	03:02
95	[Exercise] Loan Approval	02:34
96	[Exercise] Load Approval - Solution	02:33
97	Deep Learning	06:56
98	Summary	03:23
99	Introduction	05:07
100	Natural Language Processing (NLP) before LLMs	06:11
101	Transformers	06:21
102	Types of LLMs	07:40
103	Hugging Face	02:19
104	Databricks Set Up	10:38
105	Using an LLM	07:36
106	Structured Output	03:42
107	Producing JSON Output	05:10
108	LLMs With Apache Spark	05:20
109	Summary	02:48
110	Introduction	06:06
111	What Is Apache Kafka?	07:00
112	Partitioning Data	08:56
113	Kafka API	07:42
114	Kafka Architecture	03:15
115	Set Up Kafka	05:53
116	Writing to Kafka	06:07
117	Reading from Kafka	07:37
118	Data Durability	06:39
119	Kafka vs Queues	02:11
120	[Exercise] Processing Records	03:44
121	[Exercise] Processing Records - Solution	02:59
122	Delivery Semantics	05:53
123	Kafka Transactions	04:34
124	Log Compaction	03:23
125	Kafka Connect	06:59
126	Using Kafka Connect	09:44
127	Outbox Pattern	04:31
128	Schema Registry	08:01
129	Using Schema Registry	08:10
130	Tiered Storage	03:28
131	[Exercise] Track Order Status Changes	04:27
132	[Exercise] Track Order Status Changes - Solution	05:06
133	Summary	04:41
134	Introduction	05:40
135	What Is Apache Flink?	05:24
136	Kafka Application	08:11
137	Multiple Streams	03:11
138	Installing Apache Flink	05:46
139	Processing Individual Records	07:22
140	[Exercise] Stream Processing	04:02
141	[Exercise] Stream Processing - Solution	02:40
142	Time Windows	06:49
143	Keyed Windows	02:40
144	Using Time Windows	05:18
145	Watermarks	10:06
146	Advanced Window Operations	06:17
147	Stateful Stream Processing	07:50
148	Using Local State	04:42
149	[Exercise] Anomalies Detection	04:35
150	[Exercise] Anomalies Detection - Solution	03:34
151	Joining Streams	05:50
152	Summary	03:10
153	Thank You!	01:18

Unlock unlimited learning

Get instant access to all 152 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Similar courses

Data Engineering on AWS

Sources: Andreas Kretz

This course is the perfect start for those who want to learn cloud technologies and start working with Amazon Web Services (AWS), one of the most popular..

4 hours 46 minutes 38 seconds

Data Analysis for Beginners: Python & Statistics

Sources: zerotomastery.io

This course is your first step into the world of data analysis using one of the main tools for analysts - Python. Without complicated terms, advanced...

6 hours 34 minutes 20 seconds

Machine Learning A-Z : Become Kaggle Master

Sources: udemy

Want to become a good Data Scientist? Then this is a right course for you. This course has been designed by IIT professionals who have mastered in Mathematics and Data Science....

36 hours 23 minutes 54 seconds

Streaming with Kafka & Spark

Sources: Andreas Kretz

This course is a comprehensive project with a full cycle of real-time data processing. You will work with data from an online store, including invoices...

2 hours 46 minutes 25 seconds

Business Intelligence with Excel

Sources: zerotomastery.io

The only course you need to launch your career as a Data Professional! Learn to master Excel's built-in power tools, including Power Query, Power Pivot Tables,

7 hours 41 minutes 24 seconds