Apache Spark Certification Training
Apache Spark is a core data skill – here is how to show you got it!
Learn Apache Spark from the ground up and show off your knowledge with the Databricks Associate Developer for Apache Spark certification. This course will transform you into a PySpark professional and get you ready to pass the popular Databricks Spark certification.
Join me for an easy to understand and engaging look into Spark and take your big data career to the next level!
Read more about the course
What will you learn?
The goal of this course is to teach you fundamental PySpark skills and prepare you to get certified with the Databricks Certified Associate Developer for Apache Spark certification.
The course includes 18 modules to help you understand how Apache Spark works internally and how to use it in practice. You can find all topics covered below, but here is an overview:
- Become a seasoned expert at coding with Spark DataFrames
- Get confident with the Databricks certification exam content
- Discover Spark's distributed, fault-tolerant data processing
- Master how to work with Spark in Databricks
- Understand the Spark cluster architecture
- Learn when and how Spark evaluates code
- Grasp Spark's efficient memory management mechanisms
- Analyze typical Spark problems like out-of-memory errors
- See how Spark executes complex operations like joins
- Become proficient in navigating through the Spark UI
- ...and many more topics – check out the full list below!
Who is this for?
Anyone with basic Python skills who wants to develop their big data processing skills! And anyone who would like to pass the popular Databricks Certified Associate Developer for Apache Spark certification using PySpark.
- If you want to learn how to use Apache Spark with the Scala programming language, this course isn't a fit. We focus on Python and PySpark exclusively, but the fundamental Spark concepts taught are applicable to both languages.
- Data analysts and developers who want to add verified big data skills and Databricks experience to their portfolio
- Data engineers who want or need a proof of their Apache Spark skills via a certification to boost their career
- Data scientists wanting to work efficiently and frustration-free with large data sets in Apache Spark
- Companies who want to enable their data staff to use Apache Spark in a professional, time- and cost-efficient way
- Anyone wanting to brush up their Apache Spark skills with a solid understanding of how it works under the hood
Watch Online Apache Spark Certification Training
# | Title | Duration |
---|---|---|
1 | 01. Introduction | 09:48 |
2 | 02. Certification Exam Overview | 05:03 |
3 | 03. Signing up for Databricks Community Edition | 01:44 |
4 | 04. Loading Data Into Databricks | 02:43 |
5 | 05. Overview of the Spark Cluster Architecture and its Components | 08:24 |
6 | 06. Getting to Know the Spark Driver | 11:55 |
7 | 07. Getting to Know Executors | 07:37 |
8 | 08. Discovering Execution Modes | 17:33 |
9 | 09. Overview | 05:38 |
10 | 10. Internal Types, DataFrames, Datasets, RDDs, and the Spark SQL API | 19:10 |
11 | 11. Hands-on Session_ Exploring Data APIs on Databricks Community Edition | 08:26 |
12 | 12. Intro to Labs | 01:12 |
13 | 13. Intro & Creating DataFrames | 06:58 |
14 | 14. Exercise_ Creating a DataFrame | 01:07 |
15 | 15. Exercise_ Creating a DataFrame - Solution | 01:59 |
16 | 16. Working with Schemas | 26:10 |
17 | 17. Exercise_ Building a Simple Schema | 01:46 |
18 | 18. Exercise_ Building a Simple Schema - Solution | 05:13 |
19 | 19. Exercise_ Building a Complex Schema | 02:28 |
20 | 20. Exercise_ Building a Complex Schema - Solution | 05:53 |
21 | 21. Type Conversion of DataFrame Columns | 07:20 |
22 | 22. Exercise_ Changing the Type of a Column | 01:50 |
23 | 23. Exercise_ Changing the Type of a Column - Solution | 04:20 |
24 | 24. Overview | 09:18 |
25 | 25. Shuffles | 07:52 |
26 | 26. Data Skew | 13:15 |
27 | 27. Spark Configurations for Partitions | 03:47 |
28 | 28. Hands-on Session_ The Power of Partitions | 30:18 |
29 | 29. Storage Layout | 17:39 |
30 | 30. Caching and Storage Levels | 10:29 |
31 | 31. Memory in Action | 30:59 |
32 | 32. Hands-on Session_ Executor Memory Management - Part 1 | 10:27 |
33 | 33. Hands-on Session_ Executor Memory Management - Part 2 | 13:08 |
34 | 34. Intro & How to Get Help in PySpark | 04:01 |
35 | 35. Partitioning Recap | 09:45 |
36 | 36. Exercise_ Repartitioning | 01:31 |
37 | 37. Exercise_ Repartitioning - Solution | 06:08 |
38 | 38. Caching Recap | 03:27 |
39 | 39. Exercise_ Caching | 01:13 |
40 | 40. Exercise_ Caching - Solution | 03:20 |
41 | 41. Overview | 07:40 |
42 | 42. Hands-On Session_ Actions vs. Transformations | 06:47 |
43 | 43. Intro & Reading Data | 18:36 |
44 | 44. Exercise_ Reading Parquet Files | 02:20 |
45 | 45. Exercise_ Reading Parquet Files - Solution | 03:44 |
46 | 46. Reading from CSV Files | 17:18 |
47 | 47. Exercise_ Reading CSV Files | 02:29 |
48 | 48. Exercise_ Reading CSV Files - Solution | 03:55 |
49 | 49. Reading from JSON Files | 05:16 |
50 | 50. Writing Data | 10:57 |
51 | 51. Exercise_ Writing to Parquet Files | 02:08 |
52 | 52. Exercise_ Writing to Parquet Files - Solution | 04:27 |
53 | 53. Writing to CSV Files | 02:53 |
54 | 54. Exercise_ Writing to CSV Files | 02:16 |
55 | 55. Exercise_ Writing to CSV Files - Solution | 03:12 |
56 | 56. Writing to JSON Files | 01:58 |
57 | 57. Using PySpark with SQL | 05:01 |
58 | 58. Exercise_ SQL in PySpark | 00:46 |
59 | 59. Exercise_ SQL in PySpark - Solution | 02:16 |
60 | 60. Overview | 16:33 |
61 | 61. Hands-on Session_ Discovering the Spark UI | 12:27 |
62 | 62. Intro & Removing Data | 16:58 |
63 | 63. Exercise_ Removing Data | 00:59 |
64 | 64. Exercise_ Removing Data - Solution | 03:16 |
65 | 65. Modifying Data | 30:49 |
66 | 66. Exercise_ Modifying Data | 02:08 |
67 | 67. Exercise_ Modifying Data - Solution | 07:22 |
68 | 68. Analyzing Data | 18:14 |
69 | 69. Exercise_ Analyzing Data | 01:39 |
70 | 70. Exercise_ Analyzing Data - Solution | 06:30 |
71 | 71. The Catalyst Optimizer | 18:32 |
72 | 72. Adaptive Query Execution | 15:32 |
73 | 73. Dynamic Partition Pruning | 10:08 |
74 | 74. The DAG_ Achieving Fault Tolerance | 12:25 |
75 | 75. Intro & Working With Dates and Times | 33:30 |
76 | 76. Exercise_ Working With Dates and Times | 02:10 |
77 | 77. Exercise_ Working With Dates and Times - Solution | 08:00 |
78 | 78. Working With Strings | 15:30 |
79 | 79. Exercise_ Working With Strings | 03:20 |
80 | 80. Exercise_ Working With Strings - Solution | 07:47 |
81 | 81. Working with Arrays | 14:38 |
82 | 82. Exercise_ Working With Arrays | 05:17 |
83 | 83. Exercise_ Working With Arrays - Solution | 13:19 |
84 | 84. Accumulator and Broadcast Variables | 11:14 |
85 | 85. Joins | 34:02 |
86 | 86. Hands-on Session_ Cross-Cluster Communication | 42:39 |
87 | 87. Intro & Grouping and Aggregating | 19:16 |
88 | 88. Exercise_ Grouping and Aggregating | 01:43 |
89 | 89. Exercise_ Grouping and Aggregating - Solution | 07:19 |
90 | 90. Joining | 15:06 |
91 | 91. Exercise_ Joining | 03:58 |
92 | 92. Exercise_ Joining - Solution | 03:58 |
93 | 93. User-Defined Functions (UDFs) | 20:29 |
94 | 94. Exercise_ UDFs | 04:06 |
95 | 95. Exercise_ UDFs - Solution | 17:51 |
96 | 96. Signing up for the Exam | 02:24 |
97 | 97. Last Minute Preparations | 01:34 |
98 | 98. Introduction | 04:36 |
99 | 99. Congratulations! | 00:50 |
Read Book Apache Spark Certification Training
# | Title |
---|---|
1 | 1-Proposed Timeline |
2 | Apache Spark Certification Exam Guide |
3 | Mastery Map 1 - Cluster Components |
4 | Mastery Map 2 - Spark Execution Modes |
5 | Mastery Map 3 - Spark Data APIs |
6 | Mastery Map 4 - Executor Memory Layout |
7 | Mastery Map 5 - PySpark Storage Levels |
8 | Mastery Map 6 - Executor Out-of-Memory Errors |
9 | Mastery Map 7 - Actions Vs. Transformations |
10 | Mastery Map 8 - Execution Hierarchy |
11 | Mastery Map 9 - A Query, From Plan to Execution |
12 | Mastery Map 10 - Adaptive Query Execution Strategies |
13 | Mastery Map 11 - Dynamic Partition Pruning |
14 | Mastery Map 12 - Joins |