Skip to main content
CF

Modern Data Warehouses & Data Lakes

58m 9s
English
Paid

As a data engineer, being adept in working with analytical platforms is crucial. This course focuses on the use of Data Lakes and Data Warehouses, which are essential for building visualizations and creating machine learning models.

Course Overview

Modern data warehouses, such as AWS Redshift, Google BigQuery, and Snowflake, have revolutionized the way we handle data. They allow seamless integration by loading data directly from files in a Data Lake, offering flexibility and convenience for analytical tasks.

What You Will Learn

  • Utilization of Data Lakes, Data Warehouses, and BI tools within a unified system
  • Loading data into Data Lakes and visualizing it in reports
  • Building integrations in Google Cloud Platform and AWS
  • Understanding and applying ETL/ELT architecture in modern data warehouses

Course Modules

Basics of Data Warehouses and Data Lakes

  • The role of data warehouses in analytical platforms
  • Loading data into a Data Warehouse via ETL/ELT
  • Understanding Data Lakes and their utilization
  • Working with files directly within a Data Lake

Practice on GCP: Cloud Storage, BigQuery, and Data Studio

  • Setting up Cloud Storage and creating a table in BigQuery
  • Data visualization in Data Studio
  • Grasping the general principles of cloud platforms

Practice on AWS: S3, Athena, Glue, and Quicksight

  • Creating data integration through S3, Athena, and Quicksight
  • Setting up Glue Data Catalog for data management
  • Detailed setup and integration of Glue

Summary and Bonus Lesson: AWS Redshift Spectrum

  • Course summary
  • Additional module on working with Redshift Spectrum using the prepared Data Catalog from the AWS project

Prerequisites

To make the most of this course, you should have:

  • Basic experience with Data Warehouses (Completing the "Data Warehouses" course from our academy is recommended)
  • Basic knowledge of AWS Athena and Redshift (for the Redshift Spectrum module, a prepared Data Catalog from the AWS project will be utilized)

This course will enhance your proficiency in modern data storage and processing systems, teaching you how to effectively leverage Data Lakes and Data Warehouses for analytics.

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 14 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 14 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (14)
#Lesson TitleDurationAccess
1
Introduction Demo
02:14
2
Data Science Platform
04:11
3
ETL & ELT Data Warehouse
06:23
4
Data Lake & Data Warehouse integration
03:30
5
GCP & AWS Piplines we build
03:15
6
GCP hands on Cloud Storage & BigQuery
08:36
7
GCP hands on create Data Studio dashboard
07:34
8
GCP Recap & AWS goals
02:13
9
AWS Setup & upload data to S3
02:13
10
Athena Data Lake manual table configuration
03:49
11
Creating a Quicksight dashboard
05:06
12
Athena configuration using AWS Glue data catalog
03:30
13
Course recap
02:37
14
BONUS Configure Redshift Spectrum table with S3
02:58
Unlock unlimited learning

Get instant access to all 13 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What are the prerequisites for enrolling in this course?
Prospective students should have a basic understanding of data engineering concepts and familiarity with cloud platforms. Prior experience with AWS or Google Cloud Platform (GCP) will be beneficial, as the course includes hands-on exercises with these technologies. Understanding ETL/ELT processes and data management principles will also help in grasping the course material more effectively.
What projects will I work on during the course?
The course includes practical exercises such as setting up Cloud Storage and creating tables in BigQuery, configuring a Quicksight dashboard, and managing data with AWS Glue. Students will also work on integrating data through S3 and Athena, and visualize data using Data Studio. These projects are designed to provide hands-on experience with modern data warehouse and data lake technologies.
Who is the target audience for this course?
This course is aimed at data engineers and professionals who want to enhance their skills in modern data storage solutions. It is also suitable for those looking to understand the integration of Data Lakes and Warehouses with BI tools. Individuals interested in cloud computing and data visualization will find the course particularly beneficial.
How does the course content compare to other data engineering courses?
The course uniquely focuses on integrating Data Lakes and Data Warehouses using both AWS and GCP platforms. It provides practical knowledge in setting up data pipelines and visualization tools, which may not be covered in detail in other courses. Additionally, the course includes modules on using ETL/ELT architecture, offering a comprehensive understanding of modern data platforms.
What specific tools and platforms are covered in this course?
The course covers a range of tools and platforms including AWS Redshift, Google BigQuery, and Snowflake for modern data warehouses. It also includes practical exercises on AWS services like S3, Athena, Glue, and Quicksight, as well as Google Cloud services like Cloud Storage and Data Studio. These tools are essential for managing and visualizing data effectively.
What topics are not covered in this course?
The course does not cover advanced data science techniques such as machine learning model development or deep learning. It also does not delve into database management systems beyond the scope of setting up and integrating data warehouses and lakes. Students seeking in-depth knowledge of specific BI tools or programming languages may need to explore additional resources.
How can the skills gained in this course be applied to careers or further studies?
The skills acquired in this course are directly applicable to careers in data engineering and analytics. Understanding the integration of Data Lakes and Warehouses with BI tools is valuable for roles involving data infrastructure and analysis. Additionally, the knowledge of cloud services like AWS and GCP can be beneficial for professionals pursuing certifications or further studies in cloud computing and data management.