Skip to main content

Apache Spark Certification Training

15h 13m 1s
English
Paid

Course description

Apache Spark is a core data skill – here is how to show you got it!

Learn Apache Spark from the ground up and show off your knowledge with the Databricks Associate Developer for Apache Spark certification. This course will transform you into a PySpark professional and get you ready to pass the popular Databricks Spark certification.

Join me for an easy to understand and engaging look into Spark and take your big data career to the next level!

Read more about the course

What will you learn?

The goal of this course is to teach you fundamental PySpark skills and prepare you to get certified with the Databricks Certified Associate Developer for Apache Spark certification.

The course includes 18 modules to help you understand how Apache Spark works internally and how to use it in practice. You can find all topics covered below, but here is an overview:

  • Become a seasoned expert at coding with Spark DataFrames
  • Get confident with the Databricks certification exam content
  • Discover Spark's distributed, fault-tolerant data processing
  • Master how to work with Spark in Databricks
  • Understand the Spark cluster architecture
  • Learn when and how Spark evaluates code
  • Grasp Spark's efficient memory management mechanisms
  • Analyze typical Spark problems like out-of-memory errors
  • See how Spark executes complex operations like joins
  • Become proficient in navigating through the Spark UI
  • ...and many more topics – check out the full list below!

Who is this for?

Anyone with basic Python skills who wants to develop their big data processing skills! And anyone who would like to pass the popular Databricks Certified Associate Developer for Apache Spark certification using PySpark.

  • If you want to learn how to use Apache Spark with the Scala programming language, this course isn't a fit. We focus on Python and PySpark exclusively, but the fundamental Spark concepts taught are applicable to both languages.
  • Data analysts and developers who want to add verified big data skills and Databricks experience to their portfolio
  • Data engineers who want or need a proof of their Apache Spark skills via a certification to boost their career
  • Data scientists wanting to work efficiently and frustration-free with large data sets in Apache Spark
  • Companies who want to enable their data staff to use Apache Spark in a professional, time- and cost-efficient way
  • Anyone wanting to brush up their Apache Spark skills with a solid understanding of how it works under the hood


Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 99 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: 01. Introduction

All Course Lessons (99)

#Lesson TitleDurationAccess
1
01. Introduction Demo
09:48
2
02. Certification Exam Overview
05:03
3
03. Signing up for Databricks Community Edition
01:44
4
04. Loading Data Into Databricks
02:43
5
05. Overview of the Spark Cluster Architecture and its Components
08:24
6
06. Getting to Know the Spark Driver
11:55
7
07. Getting to Know Executors
07:37
8
08. Discovering Execution Modes
17:33
9
09. Overview
05:38
10
10. Internal Types, DataFrames, Datasets, RDDs, and the Spark SQL API
19:10
11
11. Hands-on Session_ Exploring Data APIs on Databricks Community Edition
08:26
12
12. Intro to Labs
01:12
13
13. Intro & Creating DataFrames
06:58
14
14. Exercise_ Creating a DataFrame
01:07
15
15. Exercise_ Creating a DataFrame - Solution
01:59
16
16. Working with Schemas
26:10
17
17. Exercise_ Building a Simple Schema
01:46
18
18. Exercise_ Building a Simple Schema - Solution
05:13
19
19. Exercise_ Building a Complex Schema
02:28
20
20. Exercise_ Building a Complex Schema - Solution
05:53
21
21. Type Conversion of DataFrame Columns
07:20
22
22. Exercise_ Changing the Type of a Column
01:50
23
23. Exercise_ Changing the Type of a Column - Solution
04:20
24
24. Overview
09:18
25
25. Shuffles
07:52
26
26. Data Skew
13:15
27
27. Spark Configurations for Partitions
03:47
28
28. Hands-on Session_ The Power of Partitions
30:18
29
29. Storage Layout
17:39
30
30. Caching and Storage Levels
10:29
31
31. Memory in Action
30:59
32
32. Hands-on Session_ Executor Memory Management - Part 1
10:27
33
33. Hands-on Session_ Executor Memory Management - Part 2
13:08
34
34. Intro & How to Get Help in PySpark
04:01
35
35. Partitioning Recap
09:45
36
36. Exercise_ Repartitioning
01:31
37
37. Exercise_ Repartitioning - Solution
06:08
38
38. Caching Recap
03:27
39
39. Exercise_ Caching
01:13
40
40. Exercise_ Caching - Solution
03:20
41
41. Overview
07:40
42
42. Hands-On Session_ Actions vs. Transformations
06:47
43
43. Intro & Reading Data
18:36
44
44. Exercise_ Reading Parquet Files
02:20
45
45. Exercise_ Reading Parquet Files - Solution
03:44
46
46. Reading from CSV Files
17:18
47
47. Exercise_ Reading CSV Files
02:29
48
48. Exercise_ Reading CSV Files - Solution
03:55
49
49. Reading from JSON Files
05:16
50
50. Writing Data
10:57
51
51. Exercise_ Writing to Parquet Files
02:08
52
52. Exercise_ Writing to Parquet Files - Solution
04:27
53
53. Writing to CSV Files
02:53
54
54. Exercise_ Writing to CSV Files
02:16
55
55. Exercise_ Writing to CSV Files - Solution
03:12
56
56. Writing to JSON Files
01:58
57
57. Using PySpark with SQL
05:01
58
58. Exercise_ SQL in PySpark
00:46
59
59. Exercise_ SQL in PySpark - Solution
02:16
60
60. Overview
16:33
61
61. Hands-on Session_ Discovering the Spark UI
12:27
62
62. Intro & Removing Data
16:58
63
63. Exercise_ Removing Data
00:59
64
64. Exercise_ Removing Data - Solution
03:16
65
65. Modifying Data
30:49
66
66. Exercise_ Modifying Data
02:08
67
67. Exercise_ Modifying Data - Solution
07:22
68
68. Analyzing Data
18:14
69
69. Exercise_ Analyzing Data
01:39
70
70. Exercise_ Analyzing Data - Solution
06:30
71
71. The Catalyst Optimizer
18:32
72
72. Adaptive Query Execution
15:32
73
73. Dynamic Partition Pruning
10:08
74
74. The DAG_ Achieving Fault Tolerance
12:25
75
75. Intro & Working With Dates and Times
33:30
76
76. Exercise_ Working With Dates and Times
02:10
77
77. Exercise_ Working With Dates and Times - Solution
08:00
78
78. Working With Strings
15:30
79
79. Exercise_ Working With Strings
03:20
80
80. Exercise_ Working With Strings - Solution
07:47
81
81. Working with Arrays
14:38
82
82. Exercise_ Working With Arrays
05:17
83
83. Exercise_ Working With Arrays - Solution
13:19
84
84. Accumulator and Broadcast Variables
11:14
85
85. Joins
34:02
86
86. Hands-on Session_ Cross-Cluster Communication
42:39
87
87. Intro & Grouping and Aggregating
19:16
88
88. Exercise_ Grouping and Aggregating
01:43
89
89. Exercise_ Grouping and Aggregating - Solution
07:19
90
90. Joining
15:06
91
91. Exercise_ Joining
03:58
92
92. Exercise_ Joining - Solution
03:58
93
93. User-Defined Functions (UDFs)
20:29
94
94. Exercise_ UDFs
04:06
95
95. Exercise_ UDFs - Solution
17:51
96
96. Signing up for the Exam
02:24
97
97. Last Minute Preparations
01:34
98
98. Introduction
04:36
99
99. Congratulations!
00:50

Unlock unlimited learning

Get instant access to all 98 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book Apache Spark Certification Training

#Title
11-Proposed Timeline
2Apache Spark Certification Exam Guide
3Mastery Map 1 - Cluster Components
4Mastery Map 2 - Spark Execution Modes
5Mastery Map 3 - Spark Data APIs
6Mastery Map 4 - Executor Memory Layout
7Mastery Map 5 - PySpark Storage Levels
8Mastery Map 6 - Executor Out-of-Memory Errors
9Mastery Map 7 - Actions Vs. Transformations
10Mastery Map 8 - Execution Hierarchy
11Mastery Map 9 - A Query, From Plan to Execution
12Mastery Map 10 - Adaptive Query Execution Strategies
13Mastery Map 11 - Dynamic Partition Pruning
14Mastery Map 12 - Joins

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

PyTorch for Deep Learning and Computer Vision

PyTorch for Deep Learning and Computer Vision

Sources: udemy
PyTorch has rapidly become one of the most transformative frameworks in the field of Deep Learning. Since its release, PyTorch has completely changed the landsc
10 hours 20 minutes 51 seconds
Data Engineering on Azure

Data Engineering on Azure

Sources: Kristijan Bakarić
Microsoft Azure is a cloud platform offering more than 200 products and services for data storage, management, virtual machine deployment, and...
1 hour 20 minutes 57 seconds
HTMX + Flask: Modern Python Web Apps, Hold the JavaScript

HTMX + Flask: Modern Python Web Apps, Hold the JavaScript

Sources: Talkpython
htmx is one of the hottest properties in web development today, and for good reason. This framework, along with the libraries and techniques introduced in this course, will hav...
3 hours 3 minutes 5 seconds
Data Analysis for Beginners: Python & Statistics

Data Analysis for Beginners: Python & Statistics

Sources: zerotomastery.io
This course is your first step into the world of data analysis using one of the main tools for analysts - Python. Without complicated terms, advanced...
6 hours 34 minutes 20 seconds
Python for Financial Analysis and Algorithmic Trading

Python for Financial Analysis and Algorithmic Trading

Sources: udemy
Welcome to Python for Financial Analysis and Algorithmic Trading! Are you interested in how people use Python to conduct rigorous financial analysis and pursue algorithmic tradi...
16 hours 54 minutes 20 seconds