Skip to main content

Apache Spark Certification Training

15h 13m 1s
English
Paid

Master Apache Spark and showcase your skills with the Databricks Associate Developer for Apache Spark certification. This course is designed to transform you into a PySpark professional and prepare you to ace the Databricks Spark certification exam.

Join us for an engaging and easy-to-understand journey into Apache Spark and elevate your big data career to new heights!

What Will You Learn?

The aim of this course is to teach you fundamental PySpark skills and equip you to achieve certification as a Databricks Certified Associate Developer for Apache Spark. The course is comprised of 18 comprehensive modules that will guide you through Apache Spark's internal workings and practical usage.

Course Highlights:

  • Develop expertise in coding with Spark DataFrames
  • Gain confidence with the Databricks certification exam content
  • Understand Spark's distributed and fault-tolerant data processing
  • Master the use of Spark in Databricks
  • Learn about the Spark cluster architecture
  • Discover when and how Spark evaluates code
  • Explore Spark's efficient memory management mechanisms
  • Resolve common Spark issues like out-of-memory errors
  • Understand how Spark handles complex operations such as joins
  • Become proficient in navigating the Spark UI
  • ...and much more – check out the full list below!

Who Is This Course For?

This course is designed for individuals with basic Python skills eager to advance their big data processing abilities through PySpark. It also targets those aiming to pass the Databricks Certified Associate Developer for Apache Spark certification.

Ideal Participants:

  • Those interested in using Apache Spark with Python and PySpark, rather than Scala
  • Data analysts and developers seeking to enhance their portfolio with verified big data skills and Databricks experience
  • Data engineers desiring certification to verify their Apache Spark skills and advance their careers
  • Data scientists aiming to work efficiently with large data sets in Apache Spark
  • Organizations seeking to empower their data professionals with effective Apache Spark skills
  • Anyone looking to strengthen their understanding of Apache Spark's inner workings

About the Author: Florian Roscheck

Florian Roscheck thumbnail
As a Sr. Data Scientist at a major consumer goods company in Germany, I currently apply big data models with my data team in a business context. Sustainability is an important topic for me, not only since working in California as a data scientist at a renewable energy company. I love that Apache Spark is open-source and volunteer for promoting open practices in research, data, and scientific computing at NumFOCUS.

Watch Online 99 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 99 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: 01. Introduction
All Course Lessons (99)
#Lesson TitleDurationAccess
1
01. Introduction Demo
09:48
2
02. Certification Exam Overview
05:03
3
03. Signing up for Databricks Community Edition
01:44
4
04. Loading Data Into Databricks
02:43
5
05. Overview of the Spark Cluster Architecture and its Components
08:24
6
06. Getting to Know the Spark Driver
11:55
7
07. Getting to Know Executors
07:37
8
08. Discovering Execution Modes
17:33
9
09. Overview
05:38
10
10. Internal Types, DataFrames, Datasets, RDDs, and the Spark SQL API
19:10
11
11. Hands-on Session_ Exploring Data APIs on Databricks Community Edition
08:26
12
12. Intro to Labs
01:12
13
13. Intro & Creating DataFrames
06:58
14
14. Exercise_ Creating a DataFrame
01:07
15
15. Exercise_ Creating a DataFrame - Solution
01:59
16
16. Working with Schemas
26:10
17
17. Exercise_ Building a Simple Schema
01:46
18
18. Exercise_ Building a Simple Schema - Solution
05:13
19
19. Exercise_ Building a Complex Schema
02:28
20
20. Exercise_ Building a Complex Schema - Solution
05:53
21
21. Type Conversion of DataFrame Columns
07:20
22
22. Exercise_ Changing the Type of a Column
01:50
23
23. Exercise_ Changing the Type of a Column - Solution
04:20
24
24. Overview
09:18
25
25. Shuffles
07:52
26
26. Data Skew
13:15
27
27. Spark Configurations for Partitions
03:47
28
28. Hands-on Session_ The Power of Partitions
30:18
29
29. Storage Layout
17:39
30
30. Caching and Storage Levels
10:29
31
31. Memory in Action
30:59
32
32. Hands-on Session_ Executor Memory Management - Part 1
10:27
33
33. Hands-on Session_ Executor Memory Management - Part 2
13:08
34
34. Intro & How to Get Help in PySpark
04:01
35
35. Partitioning Recap
09:45
36
36. Exercise_ Repartitioning
01:31
37
37. Exercise_ Repartitioning - Solution
06:08
38
38. Caching Recap
03:27
39
39. Exercise_ Caching
01:13
40
40. Exercise_ Caching - Solution
03:20
41
41. Overview
07:40
42
42. Hands-On Session_ Actions vs. Transformations
06:47
43
43. Intro & Reading Data
18:36
44
44. Exercise_ Reading Parquet Files
02:20
45
45. Exercise_ Reading Parquet Files - Solution
03:44
46
46. Reading from CSV Files
17:18
47
47. Exercise_ Reading CSV Files
02:29
48
48. Exercise_ Reading CSV Files - Solution
03:55
49
49. Reading from JSON Files
05:16
50
50. Writing Data
10:57
51
51. Exercise_ Writing to Parquet Files
02:08
52
52. Exercise_ Writing to Parquet Files - Solution
04:27
53
53. Writing to CSV Files
02:53
54
54. Exercise_ Writing to CSV Files
02:16
55
55. Exercise_ Writing to CSV Files - Solution
03:12
56
56. Writing to JSON Files
01:58
57
57. Using PySpark with SQL
05:01
58
58. Exercise_ SQL in PySpark
00:46
59
59. Exercise_ SQL in PySpark - Solution
02:16
60
60. Overview
16:33
61
61. Hands-on Session_ Discovering the Spark UI
12:27
62
62. Intro & Removing Data
16:58
63
63. Exercise_ Removing Data
00:59
64
64. Exercise_ Removing Data - Solution
03:16
65
65. Modifying Data
30:49
66
66. Exercise_ Modifying Data
02:08
67
67. Exercise_ Modifying Data - Solution
07:22
68
68. Analyzing Data
18:14
69
69. Exercise_ Analyzing Data
01:39
70
70. Exercise_ Analyzing Data - Solution
06:30
71
71. The Catalyst Optimizer
18:32
72
72. Adaptive Query Execution
15:32
73
73. Dynamic Partition Pruning
10:08
74
74. The DAG_ Achieving Fault Tolerance
12:25
75
75. Intro & Working With Dates and Times
33:30
76
76. Exercise_ Working With Dates and Times
02:10
77
77. Exercise_ Working With Dates and Times - Solution
08:00
78
78. Working With Strings
15:30
79
79. Exercise_ Working With Strings
03:20
80
80. Exercise_ Working With Strings - Solution
07:47
81
81. Working with Arrays
14:38
82
82. Exercise_ Working With Arrays
05:17
83
83. Exercise_ Working With Arrays - Solution
13:19
84
84. Accumulator and Broadcast Variables
11:14
85
85. Joins
34:02
86
86. Hands-on Session_ Cross-Cluster Communication
42:39
87
87. Intro & Grouping and Aggregating
19:16
88
88. Exercise_ Grouping and Aggregating
01:43
89
89. Exercise_ Grouping and Aggregating - Solution
07:19
90
90. Joining
15:06
91
91. Exercise_ Joining
03:58
92
92. Exercise_ Joining - Solution
03:58
93
93. User-Defined Functions (UDFs)
20:29
94
94. Exercise_ UDFs
04:06
95
95. Exercise_ UDFs - Solution
17:51
96
96. Signing up for the Exam
02:24
97
97. Last Minute Preparations
01:34
98
98. Introduction
04:36
99
99. Congratulations!
00:50
Unlock unlimited learning

Get instant access to all 98 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book Apache Spark Certification Training

#Title
11-Proposed Timeline
2Apache Spark Certification Exam Guide
3Mastery Map 1 - Cluster Components
4Mastery Map 2 - Spark Execution Modes
5Mastery Map 3 - Spark Data APIs
6Mastery Map 4 - Executor Memory Layout
7Mastery Map 5 - PySpark Storage Levels
8Mastery Map 6 - Executor Out-of-Memory Errors
9Mastery Map 7 - Actions Vs. Transformations
10Mastery Map 8 - Execution Hierarchy
11Mastery Map 9 - A Query, From Plan to Execution
12Mastery Map 10 - Adaptive Query Execution Strategies
13Mastery Map 11 - Dynamic Partition Pruning
14Mastery Map 12 - Joins