dbt (data build tool) is a data transformation tool that prioritizes SQL. It facilitates simple and transparent transformation, testing, and documentation of data directly within the warehouse. With dbt, teams can efficiently create reliable datasets for analytics, machine learning, and business processes without exporting data externally. This is why dbt is becoming a key tool for data engineers, and this course is the perfect starting point for mastering it.
Introduction to dbt
Before diving into the practice, you will gain an understanding of:
- The difference between ETL and ELT,
- The challenges faced by modern data pipelines,
- The distinctions and key advantages of dbt Core and dbt Cloud.
Setup: Snowflake, dbt Core, and GitHub
As part of the practical setup, you will undertake the following:
- Create a repository on GitHub,
- Create an account in dbt Cloud and set up a data warehouse in Snowflake,
- Perform basic project configuration in dbt and define the model structure using SQL or Python files.
Building Data Pipelines in dbt
You will develop a chain of models (pipelines) using an e-commerce dataset, employing dbt Core, dbt Cloud, and Snowflake to execute step-by-step data transformation.
Materializations in dbt
Following the model building, you will learn how to store transformation results in various forms:
- Tables,
- Views,
- Incremental or ephemeral models.
Furthermore, you will discover how external and internal dbt sources function and the dependencies between them.
Testing dbt Models
Testing models is essential for ensuring data reliability. You will learn to conduct:
- Generic and bespoke tests,
- Quality and consistency checks of data at all pipeline stages.
Deployment and Scheduling Models
Once models are operational locally, you will learn how to:
- Share them with your team,
- Execute them on a schedule,
- Update models automatically.
You will delve into practices for deployment and scheduling in dbt Cloud.
Advanced dbt Features
At the course's conclusion, you will:
- Establish CI/CD processes directly in dbt Cloud,
- Generate comprehensive project documentation and learn how to utilize it within a team,
- Acquire knowledge on best practices for managing dbt in a production environment.
What the Course Includes
- Source code repository (GitHub)
- E-commerce dataset
- Step-by-step video tutorials
- A selection of useful links and additional materials
Requirements
- Basic knowledge of relational databases
- Proficiency in working with SQL
- Recommended: basic experience with Git and cloud platforms (Snowflake, dbt Cloud)