Spark and Python for Big Data with PySpark

10h 35m 43s
English
Paid

Course description

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python! One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Read more about the course

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! 

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Requirements:
  • General Programming Skills in any Language (Preferrably Python)
  • 20 GB of free space on your local computer (or alternatively a strong internet connection for AWS)
Who this course is for:
  • Someone who knows Python and would like to learn how to use it for Big Data
  • Someone who is very familiar with another programming language and needs to learn Spark

What you'll learn:

  • Use Python and Spark together to analyze Big Data
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Work on Consulting Projects that mimic real world situations!
  • Classify Customer Churn with Logisitic Regression
  • Use Spark with Random Forests for Classification
  • Learn how to use Spark's Gradient Boosted Trees
  • Use Spark's MLlib to create Powerful Machine Learning Models
  • Learn about the DataBricks Platform!
  • Get set up on Amazon Web Services EC2 for Big Data Analysis
  • Learn how to use AWS Elastic MapReduce Service!
  • Learn how to leverage the power of Linux with a Spark Environment!
  • Create a Spam filter using Spark and Natural Language Processing!
  • Use Spark Streaming to Analyze Tweets in Real Time!

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 63 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing

Watch Online Spark and Python for Big Data with PySpark

0:00
/
#1: Introduction

All Course Lessons (63)

#Lesson TitleDurationAccess
1
Introduction Demo
03:10
2
Course Overview
07:56
3
What is Spark? Why Python?
18:58
4
Set-up Overview
05:59
5
Local Installation VirtualBox Part 1
11:26
6
Local Installation VirtualBox Part 2
14:00
7
Setting up PySpark
05:46
8
AWS EC2 Set-up Guide
02:47
9
Creating the EC2 Instance
16:19
10
SSH with Mac or Linux
04:50
11
Installations on EC2
15:06
12
Databricks Setup
11:42
13
AWS EMR Setup
17:17
14
Introduction to Python Crash Course
01:34
15
Jupyter Notebook Overview
06:50
16
Python Crash Course Part One
16:09
17
Python Crash Course Part Two
12:08
18
Python Crash Course Part Three
11:20
19
Python Crash Course Exercises
01:30
20
Python Crash Course Exercise Solutions
09:27
21
Introduction to Spark DataFrames
02:27
22
Spark DataFrame Basics
10:52
23
Spark DataFrame Basics Part Two
09:56
24
Spark DataFrame Basic Operations
10:16
25
Groupby and Aggregate Operations
12:28
26
Missing Data
08:57
27
Dates and Timestamps
10:05
28
DataFrame Project Exercise
03:14
29
DataFrame Project Exercise Solutions
16:54
30
Introduction to Machine Learning and ISLR
10:22
31
Machine Learning with Spark and Python with MLlib
09:05
32
Linear Regression Theory and Reading
05:04
33
Linear Regression Documentation Example
14:20
34
Regression Evaluation
06:47
35
Linear Regression Example Code Along
15:14
36
Linear Regression Consulting Project
03:12
37
Linear Regression Consulting Project Solutions
15:33
38
Logistic Regression Theory and Reading
11:23
39
Logistic Regression Example Code Along
15:40
40
Logistic Regression Code Along
18:37
41
Logistic Regression Consulting Project
03:14
42
Logistic Regression Consulting Project Solutions
11:14
43
Tree Methods Theory and Reading
08:01
44
Tree Methods Documentation Examples
13:19
45
Decision Tress and Random Forest Code Along Examples
20:38
46
Random Forest - Classification Consulting Project
02:34
47
Random Forest Classification Consulting Project Solutions
08:01
48
K-means Clustering Theory and Reading
06:55
49
KMeans Clustering Documentation Example
09:52
50
Clustering Example Code Along
12:46
51
Clustering Consulting Project
03:10
52
Clustering Consulting Project Solutions
08:43
53
Introduction to Recommender Systems
06:33
54
Recommender System - Code Along Project
12:09
55
Introduction to Natural Language Processing
08:03
56
NLP Tools Part One
16:13
57
NLP Tools Part Two
08:06
58
Natural Language Processing Code Along Project
14:09
59
Introduction to Streaming with Spark!
10:20
60
Spark Streaming Documentation Example
11:48
61
Spark Streaming Twitter Project - Part
04:30
62
Spark Streaming Twitter Project - Part Two
13:09
63
Spark Streaming Twitter Project - Part Three
17:36

Unlock unlimited learning

Get instant access to all 62 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Python for Financial Analysis and Algorithmic Trading

Python for Financial Analysis and Algorithmic Trading

Sources: udemy
Welcome to Python for Financial Analysis and Algorithmic Trading! Are you interested in how people use Python to conduct rigorous financial analysis and pursue algorithmic tradi...
16 hours 54 minutes 20 seconds
Python Jumpstart by Building 10 Apps

Python Jumpstart by Building 10 Apps

Sources: Talkpython
Programming is fun and profitable. Learning to become a software developer should be equally fun! This course will teach you everything you need to know about the Python languag...
7 hours 8 minutes 59 seconds
The Software Designer Mindset (COMPLETE)

The Software Designer Mindset (COMPLETE)

Sources: ArjanCodes
"The Software Designer Mindset" is a course that teaches all aspects of software architecture and offers practical advice on creating scalable software...
14 hours 32 minutes 58 seconds
Fundamentals of Apache Airflow

Fundamentals of Apache Airflow

Sources: zerotomastery.io
This practical course starts with the basics and step by step guides you to building real orchestration scenarios - from task retry executions to...
2 hours 21 minutes 18 seconds
Build a Python REST API with the Django Rest Framework

Build a Python REST API with the Django Rest Framework

Sources: udemy
How does Apple Maps have Yelp listings? How does Tinder get Facebook user profiles? How does Amazon Alexa know the latest news? These questions get to the core of how powerful R...
10 hours 8 minutes 56 seconds