Skip to main content

Data Engineering on AWS

4h 46m 38s
English
Paid

Course description

This course is the perfect start for those who want to master cloud technologies and begin working with Amazon Web Services (AWS), one of the most popular platforms for data processing. The course is especially useful for beginner data engineers and those seeking their first job in this field.

Throughout the course, you will create a fully-fledged end-to-end project based on data from an online store. Step by step, you will learn to model data, build pipelines, and work with key AWS tools: Lambda, API Gateway, Kinesis, DynamoDB, Redshift, Glue, and S3.

Read more about the course

What to expect in the course:

  • Data Work
    • Learn the structure and types of data you will be working with. Define the project goals - an important step for successful implementation.
  • Platform and Pipeline Design
    • Get acquainted with the platform architecture and design pipelines: for data loading, storage in S3 (Data Lake), processing in DynamoDB (NoSQL), and Redshift (Data Warehouse). Learn to build pipelines for interfaces and data streaming.
  • Basics of AWS
    • Create an account in AWS, understand access and security management (IAM), get introduced to CloudWatch and the Boto3 library for working with AWS through Python.
  • Data Ingestion Pipeline
    • Create an API via API Gateway, send data to Kinesis, configure IAM, and develop an ingestion pipeline in Python.
  • Data Transfer to S3 (Data Lake)
    • Set up a Lambda function to receive data from Kinesis and save it to S3.
  • Data Transfer to DynamoDB
    • Implement a pipeline for transferring data from Kinesis to DynamoDB - a fast NoSQL database.
  • API for Data Access
    • Create an API for working with data in the database. Learn why direct access from visualization to the database is a bad practice.
  • Data Visualization in Redshift
    • Send streaming data to Redshift via Kinesis Firehose, create a Redshift cluster, configure security, create tables, and set up Firehose. Connect Power BI to Redshift for data analysis.
  • Batch Processing: AWS Glue, S3, and Redshift
    • Master batch data processing: set up and run Glue to write data from S3 to Redshift, understand Crawler and data catalog, and learn to debug processes.

This course will help you gain practical experience in creating streaming and batch pipelines in AWS, as well as mastering key tools for working with cloud data.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 58 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing

Watch Online Data Engineering on AWS

0:00
/
#1: Important: Before you start!

All Course Lessons (58)

#Lesson TitleDurationAccess
1
Important: Before you start! Demo
00:31
2
Introduction
02:22
3
Data Engineering
04:16
4
Data Science Platform
05:21
5
Data Types You Encounter
03:04
6
What Is A Good Dataset
02:55
7
The Dataset We Use
03:17
8
Defining The Purpose
06:28
9
Relational Storage Possibilities
03:47
10
NoSQL Storage Possibilities
06:29
11
Selecting The Tools
03:50
12
Client
03:06
13
Connect
01:19
14
Buffer
01:30
15
Process
02:43
16
Store
03:42
17
Visualize
03:02
18
Data Ingestion Pipeline
03:01
19
Stream To Raw Storage Pipeline
02:20
20
Stream To DynamoDB Pipeline
03:10
21
Visualization API Pipeline
02:57
22
Visualization Redshift Data Warehouse Pipeline
05:30
23
Batch Processing Pipeline
03:20
24
Create An AWS Account
01:59
25
Things To Keep In Mind
02:46
26
IAM Identity & Access Management
04:08
27
Logging
02:23
28
AWS Python API Boto3
02:58
29
Development Environment
04:03
30
Create Lambda for API
02:34
31
Create API Gateway
08:31
32
Setup Kinesis
01:39
33
Setup IAM for API
05:01
34
Create Ingestion Pipeline (Code)
06:10
35
Create Script to Send Data
05:47
36
Test The Pipeline
04:54
37
Setup S3 Bucket
03:43
38
Configure IAM For S3
03:22
39
Create Lambda For S3 Insert
07:17
40
Test The Pipeline
04:02
41
Setup DynamoDB
09:01
42
Setup IAM For DynamoDB Stream
03:37
43
Create DynamoDB Lambda
09:21
44
Create API & Lambda For Access
06:11
45
Test The API
04:48
46
Setup Redshift Data Warehouse
08:09
47
Security Group For Firehose
03:13
48
Create Redshift Tables
05:52
49
S3 Bucket & jsonpaths.json
03:03
50
Configure Firehose
07:59
51
Debug Redshift Streaming
07:44
52
Bug-fixing
05:59
53
Power Bi
12:17
54
AWS Glue Basics
05:15
55
Glue Crawlers
13:10
56
Glue Jobs
13:44
57
Redshift Insert & Debugging
07:17
58
What We Achieved & Improvements
10:41

Unlock unlimited learning

Get instant access to all 57 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Complete linear algebra: theory and implementation

Complete linear algebra: theory and implementation

Sources: udemy
You need to learn linear algebra! Linear algebra is perhaps the most important branch of mathematics for computational sciences, including machine learning, AI, data science, st...
32 hours 53 minutes 26 seconds
Apache Airflow Workflow Orchestration

Apache Airflow Workflow Orchestration

Sources: Andreas Kretz
Apache Airflow is a platform-independent tool for workflow orchestration that provides extensive capabilities for creating and...
1 hour 18 minutes 41 seconds
Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Sources: udemy
Welcome to the most comprehensive Pandas course available on Udemy! An excellent choice for both beginners and experts looking to expand their knowledge on one of the most popul...
19 hours 5 minutes 40 seconds
TensorFlow Developer Certificate in 2023: Zero to Mastery

TensorFlow Developer Certificate in 2023: Zero to Mastery

Sources: zerotomastery.io
Learn TensorFlow. Pass the TensorFlow Developer Certificate Exam. Get Hired as a TensorFlow developer. This course will take you from a TensorFlow beginner to b
62 hours 43 minutes 54 seconds
The Data Science Course: Complete Data Science Bootcamp 2023

The Data Science Course: Complete Data Science Bootcamp 2023

Sources: udemy
Data scientist is one of the best suited professions to thrive this century. It is digital, programming-oriented, and analytical. Therefore, it comes as no surp
31 hours 14 minutes 14 seconds