Skip to main content

Data Engineering on Databricks

1h 27m 29s
English
Paid

Course description

Databricks is one of the most popular platforms for data processing using Apache Spark and for creating modern data warehouses (Lakehouse). In this course, you will learn everything you need for a confident start with Databricks, from the basics of the platform to creating your own pipelines and connecting BI tools.

You will learn how Databricks works, why to use it, create your notebooks, set up a computing cluster, and get acquainted with Databricks SQL Warehouse.

Read more about the course

1. Installation and Data Preparation

Before getting started with practical work, you will set up Databricks on AWS, create an S3 bucket for data storage, and set up a workspace. You will also examine the AWS CloudFormation template that Databricks uses to understand how the infrastructure is automatically deployed.

You will review the created cluster and become familiar with the dataset on which you will build your ETL process.

2. Practice: Data Processing

You will learn two ways to load data into Databricks: directly or through S3 followed by integration. You will also learn how to create code repositories. This can be done in two ways: connect a GitHub repository or create a repository manually right in Databricks.

During the project, you will complete two key tasks:

  • ETL Data Processing: run the pipeline, perform transformation, create tables, and save them in Databricks.
  • Data Visualization: perform analysis with Spark SQL in a separate notebook and create visualizations.

You will also learn how data is stored within Databricks.

3. Data Warehouse and External Integrations

Finally, you will connect Power BI to Databricks and try both integration methods: through a compute cluster and through SQL Warehouse. This way, you will learn how to integrate Databricks with external analytics tools.

Recommendations Before Starting

Before starting this course, it is recommended to complete the "Apache Spark Basics" course. With these foundational skills, you will be able to work effectively in Databricks.

Requirements:

  • AWS account
  • Databricks account
  • Knowledge of basic Spark (at the level of the "Spark Fundamentals" course)
  • Minimal costs on AWS (especially within the free tier)

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 19 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction

All Course Lessons (19)

#Lesson TitleDurationAccess
1
Introduction Demo
02:56
2
Why Databricks
04:05
3
Pricing explained
06:51
4
Create Databricks Account & Workspace
07:09
5
AWS Resources created by Databricks
04:03
6
Intro Databricks UI & Compute Cluster
06:05
7
The Dataset
02:46
8
Goals ETL & Visualization pipeline explained
02:16
9
Import Data in Databricks UI
04:44
10
Databricks Data in S3
02:09
11
Creating code Repos
04:35
12
Running our ETL job
09:26
13
Explore Data Tables in AWS folders
02:16
14
Explore data with databricks notebook 1
05:55
15
Explore data with databricks notebook 2
06:45
16
Compute Cluster vs Databricks SQL Warehouse
04:11
17
Power BI queries through computer cluster
04:21
18
Power BI queries through Databricks SQL Warehouse
04:44
19
Conclusion
02:12

Unlock unlimited learning

Get instant access to all 18 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Machine Learning & Containers on AWS

Machine Learning & Containers on AWS

Sources: Andreas Kretz
In this practical course, you will learn how to build a complete data pipeline on the AWS platform - from obtaining data from the Twitter API to analysis, stora
1 hour 33 minutes 34 seconds
Python for Data Science and Machine Learning Bootcamp

Python for Data Science and Machine Learning Bootcamp

Sources: udemy
Are you ready to start your path to becoming a Data Scientist! This comprehensive course will be your guide to learning how to use the power of Python to analy
24 hours 49 minutes 42 seconds
Fundamentals of Apache Airflow

Fundamentals of Apache Airflow

Sources: zerotomastery.io
This practical course starts with the basics and step by step guides you to building real orchestration scenarios - from task retry executions to...
2 hours 21 minutes 18 seconds
Data Engineering on AWS

Data Engineering on AWS

Sources: Andreas Kretz
This course is the perfect start for those who want to learn cloud technologies and start working with Amazon Web Services (AWS), one of the most popular..
4 hours 46 minutes 38 seconds
PyTorch for Deep Learning with Python Bootcamp

PyTorch for Deep Learning with Python Bootcamp

Sources: udemy
Welcome to the best online course for learning about Deep Learning with Python and PyTorch! PyTorch is an open source deep learning platform that provides a sea
17 hours 2 minutes 14 seconds