Skip to main content

Fundamentals of Apache Spark and PySpark

2h 20m 54s
English
Paid

Course description

Apache Spark is one of the most important tools for any aspiring Data Engineer or Data Scientist. And PySpark is a way to unleash the full power of Spark using familiar Python.

This course is designed for those who want to confidently enter the world of big data. We will explore the architecture of Spark, teach you how to write clear and efficient code in PySpark, and show you how to create scalable data processing pipelines.

The training is practice-based: you will work with real datasets, solve practical tasks, and acquire skills that are truly in demand by employers.

If you want to learn how to analyze massive amounts of data, quickly clean and transform information, and use the tools utilized by Netflix, Amazon, and other industry leaders — this course is just for you.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 29 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction

All Course Lessons (29)

#Lesson TitleDurationAccess
1
Introduction Demo
07:30
2
[Optional] What Is a Virtualenv?
06:37
3
Apache Spark
03:44
4
How Spark Works
04:24
5
Spark Application
07:41
6
DataFrames
06:43
7
Installing Spark
05:51
8
Inside Airbnb Data
07:02
9
Writing Your First Spark Job
07:05
10
Lazy Processing
02:16
11
[Exercise] Basic Functions
01:29
12
[Exercise] Basic Functions - Solution
06:41
13
Aggregating Data
04:00
14
Joining Data
04:40
15
Aggregations and Joins with Spark
06:10
16
Complex Data Types
05:09
17
[Exercise] Aggregate Functions
00:50
18
[Exercise] Aggregate Functions - Solution
05:54
19
User Defined Functions
03:25
20
Data Shuffle
06:14
21
Data Accumulators
03:42
22
Optimizing Spark Jobs
07:39
23
Submitting Spark Jobs
04:29
24
Other Spark APIs
05:16
25
Spark SQL
04:33
26
[Exercise] Advanced Spark
02:10
27
[Exercise] Advanced Spark - Solution
05:26
28
Summary
03:08
29
Let's Keep Learning Together!
01:06

Unlock unlimited learning

Get instant access to all 28 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

  • Platform & Pipeline Security

    Platform & Pipeline Security

    Sources: Andreas Kretz
    A reliable security concept for platforms and pipelines is critically important. Almost anyone can put together a Proof of Concept without an adequate level...
    34 minutes 46 seconds
  • Deep Learning A-Z™: Hands-On Artificial Neural Networks

    Deep Learning A-Z™: Hands-On Artificial Neural Networks

    Sources: udemy
    Artificial intelligence is growing exponentially. There is no doubt about that. Self-driving cars are clocking up millions of miles, IBM Watson is diagnosing pa
    22 hours 36 minutes 30 seconds
  • Storing & Visualizing Time Series Data

    Storing & Visualizing Time Series Data

    Sources: Andreas Kretz
    Processing, storing, and visualizing time series data is becoming an increasingly important task. From IoT data and system logs to statistics...
    2 hours 11 minutes 34 seconds
  • Dockerized ETL With AWS, TDengine & Grafana

    Dockerized ETL With AWS, TDengine & Grafana

    Sources: Andreas Kretz
    Data engineers often need to quickly set up a simple ETL script that just does its job. In this project, you will learn how to easily implement...
    29 minutes 12 seconds
  • Apache Airflow Workflow Orchestration

    Apache Airflow Workflow Orchestration

    Sources: Andreas Kretz
    Apache Airflow is a platform-independent tool for workflow orchestration that provides extensive capabilities for creating and...
    1 hour 18 minutes 41 seconds