Embark on your journey to mastering cloud technologies with the "Data Engineering on AWS" course. This course is tailored for beginners looking to dive into Amazon Web Services (AWS), one of the leading platforms for data processing. Ideal for aspiring data engineers, it provides a solid foundation for starting a career in this dynamic field.
Course Overview
Over the duration of this course, you'll be involved in creating a comprehensive end-to-end project utilizing data from an online store. Through a step-by-step approach, you will learn how to model data, construct data pipelines, and navigate key AWS tools such as Lambda, API Gateway, Kinesis, DynamoDB, Redshift, Glue, and S3.
What to Expect in the Course
Data Work
- Understand the structure and various types of data you'll handle. Establish clear project goals to ensure successful execution.
Platform and Pipeline Design
- Gain insights into platform architecture and pipeline design. Learn to load data, store it in S3 (Data Lake), and process it using DynamoDB (NoSQL) and Redshift (Data Warehouse). Build pipelines for interfaces and data streaming.
Basics of AWS
- Create an AWS account and familiarize yourself with access and security management (IAM). Discover CloudWatch and the Boto3 library for AWS operations using Python.
Data Ingestion Pipeline
- Create an API using API Gateway, transmit data to Kinesis, configure IAM, and develop an ingestion pipeline with Python.
Data Transfer to S3 (Data Lake)
- Configure a Lambda function to receive data from Kinesis and store it in S3.
Data Transfer to DynamoDB
- Set up a pipeline for transferring data from Kinesis to DynamoDB, a fast NoSQL database.
API for Data Access
- Create an API to interact with database data. Understand why direct access from visualization to the database is discouraged.
Data Visualization in Redshift
- Stream data to Redshift using Kinesis Firehose, establish a Redshift cluster, configure security, create tables, and set up Firehose. Integrate Power BI with Redshift for comprehensive data analysis.
Batch Processing: AWS Glue, S3, and Redshift
- Learn the techniques of batch data processing. Configure and execute Glue to write data from S3 to Redshift, understand Crawler and data catalog functionalities, and develop debugging skills.
This course is designed to equip you with essential practical skills in creating both streaming and batch pipelines on AWS, and mastering the important tools necessary to work with cloud-based data.