Skip to main content
CF

Data Engineering on AWS

4h 46m 38s
English
Paid

Embark on your journey to mastering cloud technologies with the "Data Engineering on AWS" course. This course is tailored for beginners looking to dive into Amazon Web Services (AWS), one of the leading platforms for data processing. Ideal for aspiring data engineers, it provides a solid foundation for starting a career in this dynamic field.

Course Overview

Over the duration of this course, you'll be involved in creating a comprehensive end-to-end project utilizing data from an online store. Through a step-by-step approach, you will learn how to model data, construct data pipelines, and navigate key AWS tools such as Lambda, API Gateway, Kinesis, DynamoDB, Redshift, Glue, and S3.

What to Expect in the Course

Data Work

  • Understand the structure and various types of data you'll handle. Establish clear project goals to ensure successful execution.

Platform and Pipeline Design

  • Gain insights into platform architecture and pipeline design. Learn to load data, store it in S3 (Data Lake), and process it using DynamoDB (NoSQL) and Redshift (Data Warehouse). Build pipelines for interfaces and data streaming.

Basics of AWS

  • Create an AWS account and familiarize yourself with access and security management (IAM). Discover CloudWatch and the Boto3 library for AWS operations using Python.

Data Ingestion Pipeline

  • Create an API using API Gateway, transmit data to Kinesis, configure IAM, and develop an ingestion pipeline with Python.

Data Transfer to S3 (Data Lake)

  • Configure a Lambda function to receive data from Kinesis and store it in S3.

Data Transfer to DynamoDB

  • Set up a pipeline for transferring data from Kinesis to DynamoDB, a fast NoSQL database.

API for Data Access

  • Create an API to interact with database data. Understand why direct access from visualization to the database is discouraged.

Data Visualization in Redshift

  • Stream data to Redshift using Kinesis Firehose, establish a Redshift cluster, configure security, create tables, and set up Firehose. Integrate Power BI with Redshift for comprehensive data analysis.

Batch Processing: AWS Glue, S3, and Redshift

  • Learn the techniques of batch data processing. Configure and execute Glue to write data from S3 to Redshift, understand Crawler and data catalog functionalities, and develop debugging skills.

This course is designed to equip you with essential practical skills in creating both streaming and batch pipelines on AWS, and mastering the important tools necessary to work with cloud-based data.

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 58 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 58 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Important: Before you start!
All Course Lessons (58)
#Lesson TitleDurationAccess
1
Important: Before you start! Demo
00:31
2
Introduction
02:22
3
Data Engineering
04:16
4
Data Science Platform
05:21
5
Data Types You Encounter
03:04
6
What Is A Good Dataset
02:55
7
The Dataset We Use
03:17
8
Defining The Purpose
06:28
9
Relational Storage Possibilities
03:47
10
NoSQL Storage Possibilities
06:29
11
Selecting The Tools
03:50
12
Client
03:06
13
Connect
01:19
14
Buffer
01:30
15
Process
02:43
16
Store
03:42
17
Visualize
03:02
18
Data Ingestion Pipeline
03:01
19
Stream To Raw Storage Pipeline
02:20
20
Stream To DynamoDB Pipeline
03:10
21
Visualization API Pipeline
02:57
22
Visualization Redshift Data Warehouse Pipeline
05:30
23
Batch Processing Pipeline
03:20
24
Create An AWS Account
01:59
25
Things To Keep In Mind
02:46
26
IAM Identity & Access Management
04:08
27
Logging
02:23
28
AWS Python API Boto3
02:58
29
Development Environment
04:03
30
Create Lambda for API
02:34
31
Create API Gateway
08:31
32
Setup Kinesis
01:39
33
Setup IAM for API
05:01
34
Create Ingestion Pipeline (Code)
06:10
35
Create Script to Send Data
05:47
36
Test The Pipeline
04:54
37
Setup S3 Bucket
03:43
38
Configure IAM For S3
03:22
39
Create Lambda For S3 Insert
07:17
40
Test The Pipeline
04:02
41
Setup DynamoDB
09:01
42
Setup IAM For DynamoDB Stream
03:37
43
Create DynamoDB Lambda
09:21
44
Create API & Lambda For Access
06:11
45
Test The API
04:48
46
Setup Redshift Data Warehouse
08:09
47
Security Group For Firehose
03:13
48
Create Redshift Tables
05:52
49
S3 Bucket & jsonpaths.json
03:03
50
Configure Firehose
07:59
51
Debug Redshift Streaming
07:44
52
Bug-fixing
05:59
53
Power Bi
12:17
54
AWS Glue Basics
05:15
55
Glue Crawlers
13:10
56
Glue Jobs
13:44
57
Redshift Insert & Debugging
07:17
58
What We Achieved & Improvements
10:41
Unlock unlimited learning

Get instant access to all 57 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What prerequisites should I have before starting the course?
This course is ideal for beginners, so no prior experience with AWS is required. However, familiarity with basic programming concepts and Python can be beneficial as the course involves creating scripts using Python, particularly with the Boto3 library for AWS operations.
What project will I build during the course?
Throughout the course, you will work on an end-to-end project involving data from an online store. The project includes modeling data, constructing data pipelines, and using AWS tools such as Lambda, API Gateway, Kinesis, DynamoDB, Redshift, Glue, and S3, to handle data ingestion, transfer, and visualization.
Who is the target audience for this course?
The course is designed for aspiring data engineers who wish to gain foundational knowledge in cloud technologies, specifically with AWS. It's particularly suited for those who are new to AWS and want to start a career in data engineering.
How does the depth and scope of this course compare to other AWS courses?
This course provides a comprehensive introduction to data engineering on AWS, focusing on practical applications such as setting up data pipelines and using AWS tools like S3, DynamoDB, and Redshift. It differs from more advanced courses by emphasizing foundational skills and hands-on project work.
What AWS tools and platforms does the course cover?
The course covers several AWS tools including Lambda, API Gateway, Kinesis, DynamoDB, Redshift, Glue, and S3. It also introduces IAM for security management, and CloudWatch for monitoring operations, providing a well-rounded toolkit for data engineering on AWS.
What topics are not covered in this course?
This course focuses on data engineering with AWS and does not cover topics outside this scope, such as advanced data science methods, machine learning on AWS, or non-AWS cloud platforms like Google Cloud or Azure.
How much time should I expect to commit to this course?
The course consists of 58 lessons. While the total runtime is not specified, students should anticipate spending additional time on practical exercises and project work to fully grasp the concepts and apply the skills learned in real-world scenarios.