Data Engineering on Databricks

1h 27m 29s
English
Paid

Databricks is one of the most popular platforms for data processing using Apache Spark and for creating modern data warehouses (Lakehouse). In this course, you will learn everything you need for a confident start with Databricks, from the basics of the platform to creating your own pipelines and connecting BI tools.

You will learn how Databricks works, why to use it, create your notebooks, set up a computing cluster, and get acquainted with Databricks SQL Warehouse.

Read more about the course

1. Installation and Data Preparation

Before getting started with practical work, you will set up Databricks on AWS, create an S3 bucket for data storage, and set up a workspace. You will also examine the AWS CloudFormation template that Databricks uses to understand how the infrastructure is automatically deployed.

You will review the created cluster and become familiar with the dataset on which you will build your ETL process.

2. Practice: Data Processing

You will learn two ways to load data into Databricks: directly or through S3 followed by integration. You will also learn how to create code repositories. This can be done in two ways: connect a GitHub repository or create a repository manually right in Databricks.

During the project, you will complete two key tasks:

  • ETL Data Processing: run the pipeline, perform transformation, create tables, and save them in Databricks.
  • Data Visualization: perform analysis with Spark SQL in a separate notebook and create visualizations.

You will also learn how data is stored within Databricks.

3. Data Warehouse and External Integrations

Finally, you will connect Power BI to Databricks and try both integration methods: through a compute cluster and through SQL Warehouse. This way, you will learn how to integrate Databricks with external analytics tools.

Recommendations Before Starting

Before starting this course, it is recommended to complete the "Apache Spark Basics" course. With these foundational skills, you will be able to work effectively in Databricks.

Requirements:

  • AWS account
  • Databricks account
  • Knowledge of basic Spark (at the level of the "Spark Fundamentals" course)
  • Minimal costs on AWS (especially within the free tier)

Watch Online Data Engineering on Databricks

Join premium to watch
Go to premium
# Title Duration
1 Introduction 02:56
2 Why Databricks 04:05
3 Pricing explained 06:51
4 Create Databricks Account & Workspace 07:09
5 AWS Resources created by Databricks 04:03
6 Intro Databricks UI & Compute Cluster 06:05
7 The Dataset 02:46
8 Goals ETL & Visualization pipeline explained 02:16
9 Import Data in Databricks UI 04:44
10 Databricks Data in S3 02:09
11 Creating code Repos 04:35
12 Running our ETL job 09:26
13 Explore Data Tables in AWS folders 02:16
14 Explore data with databricks notebook 1 05:55
15 Explore data with databricks notebook 2 06:45
16 Compute Cluster vs Databricks SQL Warehouse 04:11
17 Power BI queries through computer cluster 04:21
18 Power BI queries through Databricks SQL Warehouse 04:44
19 Conclusion 02:12

Similar courses to Data Engineering on Databricks

Statistics for Data Science and Business Analysis

Statistics for Data Science and Business Analysisudemy

Category: Data processing and analysis
Duration 4 hours 49 minutes 30 seconds
Python for Data Science and Machine Learning Bootcamp

Python for Data Science and Machine Learning Bootcampudemy

Category: Python, Data processing and analysis
Duration 24 hours 49 minutes 42 seconds
Deep Learning: Advanced Computer Vision

Deep Learning: Advanced Computer Visionudemy

Category: Data processing and analysis
Duration 15 hours 10 minutes 54 seconds
Getting Started with Embedded AI | Edge AI

Getting Started with Embedded AI | Edge AIudemy

Category: Data processing and analysis
Duration 3 hours 33 minutes 42 seconds
SQL & Database Design A-Z™: Learn MS SQL Server + PostgreSQL

SQL & Database Design A-Z™: Learn MS SQL Server + PostgreSQLudemy

Category: Sql, Data processing and analysis
Duration 12 hours 32 minutes 7 seconds
Learning Apache Spark

Learning Apache SparkAndreas Kretz

Category: Data processing and analysis
Duration 1 hour 44 minutes 4 seconds
Data Engineering with Hadoop

Data Engineering with HadoopSuyog Nagaokar

Category: Data processing and analysis
Duration 7 hours 3 minutes
MongoDB Fundamentals

MongoDB FundamentalsAndreas Kretz

Category: MongoDB, Data processing and analysis
Duration 1 hour 23 minutes 19 seconds
Apache Airflow Workflow Orchestration

Apache Airflow Workflow OrchestrationAndreas Kretz

Category: Other (Tools), Data processing and analysis
Duration 1 hour 18 minutes 41 seconds
Becoming a Better Data Engineer

Becoming a Better Data EngineerAndreas Kretz

Category: Data processing and analysis
Duration 1 hour 46 minutes 10 seconds