Data Engineering on AWS

4h 46m 38s
English
Paid

Course description

This course is the perfect start for those who want to master cloud technologies and begin working with Amazon Web Services (AWS), one of the most popular platforms for data processing. The course is especially useful for beginner data engineers and those seeking their first job in this field.

Throughout the course, you will create a fully-fledged end-to-end project based on data from an online store. Step by step, you will learn to model data, build pipelines, and work with key AWS tools: Lambda, API Gateway, Kinesis, DynamoDB, Redshift, Glue, and S3.

Read more about the course

What to expect in the course:

  • Data Work
    • Learn the structure and types of data you will be working with. Define the project goals - an important step for successful implementation.
  • Platform and Pipeline Design
    • Get acquainted with the platform architecture and design pipelines: for data loading, storage in S3 (Data Lake), processing in DynamoDB (NoSQL), and Redshift (Data Warehouse). Learn to build pipelines for interfaces and data streaming.
  • Basics of AWS
    • Create an account in AWS, understand access and security management (IAM), get introduced to CloudWatch and the Boto3 library for working with AWS through Python.
  • Data Ingestion Pipeline
    • Create an API via API Gateway, send data to Kinesis, configure IAM, and develop an ingestion pipeline in Python.
  • Data Transfer to S3 (Data Lake)
    • Set up a Lambda function to receive data from Kinesis and save it to S3.
  • Data Transfer to DynamoDB
    • Implement a pipeline for transferring data from Kinesis to DynamoDB - a fast NoSQL database.
  • API for Data Access
    • Create an API for working with data in the database. Learn why direct access from visualization to the database is a bad practice.
  • Data Visualization in Redshift
    • Send streaming data to Redshift via Kinesis Firehose, create a Redshift cluster, configure security, create tables, and set up Firehose. Connect Power BI to Redshift for data analysis.
  • Batch Processing: AWS Glue, S3, and Redshift
    • Master batch data processing: set up and run Glue to write data from S3 to Redshift, understand Crawler and data catalog, and learn to debug processes.

This course will help you gain practical experience in creating streaming and batch pipelines in AWS, as well as mastering key tools for working with cloud data.

Watch Online

Join premium to watch
Go to premium
# Title Duration
1 Important: Before you start! 00:31
2 Introduction 02:22
3 Data Engineering 04:16
4 Data Science Platform 05:21
5 Data Types You Encounter 03:04
6 What Is A Good Dataset 02:55
7 The Dataset We Use 03:17
8 Defining The Purpose 06:28
9 Relational Storage Possibilities 03:47
10 NoSQL Storage Possibilities 06:29
11 Selecting The Tools 03:50
12 Client 03:06
13 Connect 01:19
14 Buffer 01:30
15 Process 02:43
16 Store 03:42
17 Visualize 03:02
18 Data Ingestion Pipeline 03:01
19 Stream To Raw Storage Pipeline 02:20
20 Stream To DynamoDB Pipeline 03:10
21 Visualization API Pipeline 02:57
22 Visualization Redshift Data Warehouse Pipeline 05:30
23 Batch Processing Pipeline 03:20
24 Create An AWS Account 01:59
25 Things To Keep In Mind 02:46
26 IAM Identity & Access Management 04:08
27 Logging 02:23
28 AWS Python API Boto3 02:58
29 Development Environment 04:03
30 Create Lambda for API 02:34
31 Create API Gateway 08:31
32 Setup Kinesis 01:39
33 Setup IAM for API 05:01
34 Create Ingestion Pipeline (Code) 06:10
35 Create Script to Send Data 05:47
36 Test The Pipeline 04:54
37 Setup S3 Bucket 03:43
38 Configure IAM For S3 03:22
39 Create Lambda For S3 Insert 07:17
40 Test The Pipeline 04:02
41 Setup DynamoDB 09:01
42 Setup IAM For DynamoDB Stream 03:37
43 Create DynamoDB Lambda 09:21
44 Create API & Lambda For Access 06:11
45 Test The API 04:48
46 Setup Redshift Data Warehouse 08:09
47 Security Group For Firehose 03:13
48 Create Redshift Tables 05:52
49 S3 Bucket & jsonpaths.json 03:03
50 Configure Firehose 07:59
51 Debug Redshift Streaming 07:44
52 Bug-fixing 05:59
53 Power Bi 12:17
54 AWS Glue Basics 05:15
55 Glue Crawlers 13:10
56 Glue Jobs 13:44
57 Redshift Insert & Debugging 07:17
58 What We Achieved & Improvements 10:41

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Case Study in Product Data Science

Case Study in Product Data Science

Sources: LunarTech
This is a course that offers unique opportunities for students seeking to master key aspects of data analysis in product development. The course...
1 hour 4 minutes 47 seconds
DS4B 101-P: Python for Data Science Automation

DS4B 101-P: Python for Data Science Automation

Sources: Business Science University
Python for Data Science Automation is an innovative course designed to teach data analysts how to convert business processes to python-based data science automations. The course...
27 hours 6 minutes 1 second
Stratospheric - From Zero to Production with Spring Boot and AWS + BOOK

Stratospheric - From Zero to Production with Spring Boot and AWS + BOOK

Sources: leanpub
Hands-on online course to learn all you need to know to get a Spring Boot application into production with AWS. This online course builds on top of the...
7 hours 19 minutes 39 seconds
Machine Learning: Natural Language Processing in Python (V2)

Machine Learning: Natural Language Processing in Python (V2)

Sources: udemy
Welcome to Machine Learning: Natural Language Processing in Python (Version 2). NLP: Use Markov Models, NLTK, Artificial Intelligence, Deep Learning, Machine Le
22 hours 4 minutes 2 seconds
Deep Learning: Advanced Computer Vision

Deep Learning: Advanced Computer Vision

Sources: udemy
This is one of the most exciting courses I’ve done and it really shows how fast and how far deep learning has come over the years. When I first started my deep
15 hours 10 minutes 54 seconds