Skip to main content
CF

dbt for Data Engineers

1h 52m 55s
English
Paid

dbt (data build tool) is a data transformation tool that prioritizes SQL. It facilitates simple and transparent transformation, testing, and documentation of data directly within the warehouse. With dbt, teams can efficiently create reliable datasets for analytics, machine learning, and business processes without exporting data externally. This is why dbt is becoming a key tool for data engineers, and this course is the perfect starting point for mastering it.

Introduction to dbt

Before diving into the practice, you will gain an understanding of:

  • The difference between ETL and ELT,
  • The challenges faced by modern data pipelines,
  • The distinctions and key advantages of dbt Core and dbt Cloud.

Setup: Snowflake, dbt Core, and GitHub

As part of the practical setup, you will undertake the following:

  • Create a repository on GitHub,
  • Create an account in dbt Cloud and set up a data warehouse in Snowflake,
  • Perform basic project configuration in dbt and define the model structure using SQL or Python files.

Building Data Pipelines in dbt

You will develop a chain of models (pipelines) using an e-commerce dataset, employing dbt Core, dbt Cloud, and Snowflake to execute step-by-step data transformation.

Materializations in dbt

Following the model building, you will learn how to store transformation results in various forms:

  • Tables,
  • Views,
  • Incremental or ephemeral models.

Furthermore, you will discover how external and internal dbt sources function and the dependencies between them.

Testing dbt Models

Testing models is essential for ensuring data reliability. You will learn to conduct:

  • Generic and bespoke tests,
  • Quality and consistency checks of data at all pipeline stages.

Deployment and Scheduling Models

Once models are operational locally, you will learn how to:

  • Share them with your team,
  • Execute them on a schedule,
  • Update models automatically.

You will delve into practices for deployment and scheduling in dbt Cloud.

Advanced dbt Features

At the course's conclusion, you will:

  • Establish CI/CD processes directly in dbt Cloud,
  • Generate comprehensive project documentation and learn how to utilize it within a team,
  • Acquire knowledge on best practices for managing dbt in a production environment.

What the Course Includes

  • Source code repository (GitHub)
  • E-commerce dataset
  • Step-by-step video tutorials
  • A selection of useful links and additional materials

Requirements

  • Basic knowledge of relational databases
  • Proficiency in working with SQL
  • Recommended: basic experience with Git and cloud platforms (Snowflake, dbt Cloud)

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 23 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 23 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (23)
#Lesson TitleDurationAccess
1
Introduction Demo
02:24
2
Modern data experience
05:43
3
Introduction to dbt
04:39
4
Goals of this course
04:51
5
Snowflake preparation
07:30
6
Loading data into Snowflake
09:36
7
Setup dbt Core
03:33
8
Preparing the GitHub repository
06:17
9
dbt models & materialization explained
05:49
10
Creating your first sql model
05:29
11
Working with custom schemas
04:36
12
Creating your first python model
01:56
13
dbt sources
04:04
14
Configuring sources
04:21
15
Working with seed files
03:20
16
Generic tests
03:26
17
Tests with Great Expectations
02:50
18
Writing custom generic tests
07:26
19
dbt cloud setup
05:15
20
creating dbt jobs
10:53
21
CI/CD automation with dbt cloud and GitHub
07:39
22
Documenation in dbt
01:18
23
Conclusion
00:00
Unlock unlimited learning

Get instant access to all 22 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What are the prerequisites for this course?
The course does not explicitly list prerequisites, but familiarity with SQL, data transformation concepts, and basic knowledge of data warehousing can be beneficial. The course assumes an understanding of ETL and ELT processes, which are discussed in the introductory lessons. Experience with GitHub and cloud platforms like Snowflake would also help, as these tools are used throughout the course.
What kind of projects will I build during the course?
Throughout the course, students will build data transformation pipelines using dbt Core, dbt Cloud, and Snowflake. The practical exercises include creating models for an e-commerce dataset, configuring dbt projects, and setting up a data warehouse in Snowflake. Students will also work on materializing data transformations into tables, views, and incremental models.
Who is the target audience for this course?
The course is designed for data engineers looking to master dbt as a tool for data transformation and management. It is also suitable for data professionals interested in learning how to build reliable datasets within data warehouses for analytics, machine learning, and business processes.
How does this course compare to other data engineering courses?
This course specifically focuses on dbt as a tool for data transformation, offering a detailed look at its implementation within data pipelines using Snowflake and GitHub. Unlike broader data engineering courses, it provides in-depth coverage of dbt’s role in modern data pipelines, including model materialization, testing, and documentation directly in the data warehouse.
What specific tools and platforms will I use in this course?
Students will use dbt Core and dbt Cloud for data transformation, Snowflake as the data warehouse, and GitHub for version control. These tools are integral to the course and are used to create data models, manage data transformations, and automate CI/CD pipelines.
What topics are not covered in this course?
The course does not cover data extraction or loading processes in-depth, focusing instead on the transformation aspect using dbt. It also does not delve into other data warehousing platforms beyond Snowflake, nor does it cover advanced machine learning models or data visualization techniques.
How much time should I expect to dedicate to this course?
The course consists of 23 lessons. While the total runtime is not specified, learners should allocate time for both theoretical lessons and practical exercises, including setting up tools like dbt Core, Snowflake, and GitHub. The time commitment will vary based on familiarity with the tools and the complexity of the exercises.