As a data engineer, being adept in working with analytical platforms is crucial. This course focuses on the use of Data Lakes and Data Warehouses, which are essential for building visualizations and creating machine learning models.
Course Overview
Modern data warehouses, such as AWS Redshift, Google BigQuery, and Snowflake, have revolutionized the way we handle data. They allow seamless integration by loading data directly from files in a Data Lake, offering flexibility and convenience for analytical tasks.
What You Will Learn
- Utilization of Data Lakes, Data Warehouses, and BI tools within a unified system
- Loading data into Data Lakes and visualizing it in reports
- Building integrations in Google Cloud Platform and AWS
- Understanding and applying ETL/ELT architecture in modern data warehouses
Course Modules
Basics of Data Warehouses and Data Lakes
- The role of data warehouses in analytical platforms
- Loading data into a Data Warehouse via ETL/ELT
- Understanding Data Lakes and their utilization
- Working with files directly within a Data Lake
Practice on GCP: Cloud Storage, BigQuery, and Data Studio
- Setting up Cloud Storage and creating a table in BigQuery
- Data visualization in Data Studio
- Grasping the general principles of cloud platforms
Practice on AWS: S3, Athena, Glue, and Quicksight
- Creating data integration through S3, Athena, and Quicksight
- Setting up Glue Data Catalog for data management
- Detailed setup and integration of Glue
Summary and Bonus Lesson: AWS Redshift Spectrum
- Course summary
- Additional module on working with Redshift Spectrum using the prepared Data Catalog from the AWS project
Prerequisites
To make the most of this course, you should have:
- Basic experience with Data Warehouses (Completing the "Data Warehouses" course from our academy is recommended)
- Basic knowledge of AWS Athena and Redshift (for the Redshift Spectrum module, a prepared Data Catalog from the AWS project will be utilized)
This course will enhance your proficiency in modern data storage and processing systems, teaching you how to effectively leverage Data Lakes and Data Warehouses for analytics.