Spark and Python for Big Data with PySpark

10h 35m 43s
English
Paid
September 12, 2024

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python! One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

More

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! 

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Requirements:
  • General Programming Skills in any Language (Preferrably Python)
  • 20 GB of free space on your local computer (or alternatively a strong internet connection for AWS)
Who this course is for:
  • Someone who knows Python and would like to learn how to use it for Big Data
  • Someone who is very familiar with another programming language and needs to learn Spark

What you'll learn:

  • Use Python and Spark together to analyze Big Data
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Work on Consulting Projects that mimic real world situations!
  • Classify Customer Churn with Logisitic Regression
  • Use Spark with Random Forests for Classification
  • Learn how to use Spark's Gradient Boosted Trees
  • Use Spark's MLlib to create Powerful Machine Learning Models
  • Learn about the DataBricks Platform!
  • Get set up on Amazon Web Services EC2 for Big Data Analysis
  • Learn how to use AWS Elastic MapReduce Service!
  • Learn how to leverage the power of Linux with a Spark Environment!
  • Create a Spam filter using Spark and Natural Language Processing!
  • Use Spark Streaming to Analyze Tweets in Real Time!

Watch Online Spark and Python for Big Data with PySpark

Join premium to watch
Go to premium
# Title Duration
1 Introduction 03:10
2 Course Overview 07:56
3 What is Spark? Why Python? 18:58
4 Set-up Overview 05:59
5 Local Installation VirtualBox Part 1 11:26
6 Local Installation VirtualBox Part 2 14:00
7 Setting up PySpark 05:46
8 AWS EC2 Set-up Guide 02:47
9 Creating the EC2 Instance 16:19
10 SSH with Mac or Linux 04:50
11 Installations on EC2 15:06
12 Databricks Setup 11:42
13 AWS EMR Setup 17:17
14 Introduction to Python Crash Course 01:34
15 Jupyter Notebook Overview 06:50
16 Python Crash Course Part One 16:09
17 Python Crash Course Part Two 12:08
18 Python Crash Course Part Three 11:20
19 Python Crash Course Exercises 01:30
20 Python Crash Course Exercise Solutions 09:27
21 Introduction to Spark DataFrames 02:27
22 Spark DataFrame Basics 10:52
23 Spark DataFrame Basics Part Two 09:56
24 Spark DataFrame Basic Operations 10:16
25 Groupby and Aggregate Operations 12:28
26 Missing Data 08:57
27 Dates and Timestamps 10:05
28 DataFrame Project Exercise 03:14
29 DataFrame Project Exercise Solutions 16:54
30 Introduction to Machine Learning and ISLR 10:22
31 Machine Learning with Spark and Python with MLlib 09:05
32 Linear Regression Theory and Reading 05:04
33 Linear Regression Documentation Example 14:20
34 Regression Evaluation 06:47
35 Linear Regression Example Code Along 15:14
36 Linear Regression Consulting Project 03:12
37 Linear Regression Consulting Project Solutions 15:33
38 Logistic Regression Theory and Reading 11:23
39 Logistic Regression Example Code Along 15:40
40 Logistic Regression Code Along 18:37
41 Logistic Regression Consulting Project 03:14
42 Logistic Regression Consulting Project Solutions 11:14
43 Tree Methods Theory and Reading 08:01
44 Tree Methods Documentation Examples 13:19
45 Decision Tress and Random Forest Code Along Examples 20:38
46 Random Forest - Classification Consulting Project 02:34
47 Random Forest Classification Consulting Project Solutions 08:01
48 K-means Clustering Theory and Reading 06:55
49 KMeans Clustering Documentation Example 09:52
50 Clustering Example Code Along 12:46
51 Clustering Consulting Project 03:10
52 Clustering Consulting Project Solutions 08:43
53 Introduction to Recommender Systems 06:33
54 Recommender System - Code Along Project 12:09
55 Introduction to Natural Language Processing 08:03
56 NLP Tools Part One 16:13
57 NLP Tools Part Two 08:06
58 Natural Language Processing Code Along Project 14:09
59 Introduction to Streaming with Spark! 10:20
60 Spark Streaming Documentation Example 11:48
61 Spark Streaming Twitter Project - Part 04:30
62 Spark Streaming Twitter Project - Part Two 13:09
63 Spark Streaming Twitter Project - Part Three 17:36

Similar courses to Spark and Python for Big Data with PySpark

REST APIs with Flask and Python

REST APIs with Flask and Pythonudemy

Duration 11 hours 56 minutes 4 seconds
Introduction to Ansible

Introduction to AnsibleTalkpython

Duration 2 hours 54 minutes 19 seconds
Python Data Visualization

Python Data VisualizationTalkpython

Duration 4 hours 36 minutes 12 seconds