Skip to main content
CF

Spark Streaming with Scala

11h 17m 52s
English
Paid

Stream big data in real time with Spark and integrate any data source, from Kafka to Twitter.

Nothing static, all in motion.

Your probably know by now: Spark is the most popular computing engine for big data, the most maintained, and with a proven track record of performance. It's 100 times faster than the old MapReduce paradigm, and can easily be extended with machine learning and streaming capabilities, and much more.

In this course, we'll take the natural step forward: process big data as it arrives.

Additional

https://github.com/rockthejvm/spark-streaming

About the Author: Rock the JVM

Rock the JVM thumbnail

Rock the JVM (rockthejvm.com) is a Romania-based JVM-focused training platform run by Daniel Ciocîrlan — focused entirely on Scala, the broader JVM ecosystem, and the data-engineering / streaming stack built on top of it (Spark, Flink, Kafka). The platform is one of the most authoritative independent sources on Scala and functional programming on the JVM.

The CourseFlix listing carries four Rock the JVM courses: Scala & Functional Programming for Beginners, Cats (the Scala functional-programming library), Spark Streaming with Scala, and Apache Flink. The teaching style is unusually rigorous about the functional-programming fundamentals underneath the framework material.

Material is paid and aimed at engineers picking up Scala or building data-streaming systems on the JVM. For broader content, see CourseFlix's Scala, Java, and Messaging & Streaming category pages.

Watch Online 31 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 31 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Welcome
All Course Lessons (31)
#Lesson TitleDurationAccess
1
Welcome Demo
18:05
2
Scala Recap
25:06
3
Spark Recap
27:11
4
Spark Streaming First Principles
08:29
5
Streaming DataFrames
27:17
6
Streaming Aggregations
15:58
7
Streaming Joins
25:24
8
Streaming Datasets
26:46
9
Discretized Streams (DStreams)
31:57
10
DStreams Transformations
28:40
11
DStreams Window Functions
32:11
12
Kafka & Structured Streaming
25:08
13
Kafka & DStreams
29:57
14
JDBC with Postgres
11:51
15
Akka and Akka Streams
28:58
16
Cassandra
21:55
17
Setting up a Twitter App
08:17
18
Our First Custom Receiver
15:45
19
Reading Tweets
20:05
20
Reading Tweets: Exercises
15:22
21
Sentiment Analysis on Tweets using NLP
20:50
22
Event Time Windows
31:20
23
Event Time Windows: Exercises
14:54
24
Processing Time Windows
11:16
25
Watermarks
25:16
26
Watermarks, Part 2
25:22
27
Arbitrary Stateful Computation
25:40
28
Arbitrary Stateful Computation
16:36
29
Setting up the REST Server and the Kafka Broker
27:51
30
Integrating Spark Structured Streaming, Test, Run!
33:54
31
You Rock!
00:31
Unlock unlimited learning

Get instant access to all 30 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What are the prerequisites for this course?
Before enrolling, a solid understanding of Scala is beneficial, as the course includes a Scala Recap but assumes familiarity with the language. Additionally, prior exposure to Apache Spark will help, as the course covers Spark Streaming First Principles and more advanced topics like Streaming DataFrames and DStreams.
What kind of projects will I build in this course?
The course involves practical projects such as setting up a Twitter app for data streaming and performing sentiment analysis on tweets using NLP. Students will also work with Kafka and create custom receivers to handle streaming data. These projects aim to integrate real-time data sources and perform advanced data processing.
Who is the target audience for this course?
This course is designed for software developers and data engineers interested in real-time data processing using Spark Streaming. It suits those who have a background in Scala and Apache Spark, and are looking to expand their skills to include streaming and integrating various data sources like Kafka and Twitter.
How does the depth of this course compare to other courses on Spark Streaming?
The course provides a comprehensive exploration of Spark Streaming, covering both Structured Streaming and DStreams. It delves into advanced topics such as Event Time Windows, Watermarks, and Arbitrary Stateful Computation. It also includes integration with external systems like Kafka, Postgres, and Cassandra, offering a broad and practical understanding of the subject.
What specific tools or platforms are used in this course?
Students will work with Apache Spark for real-time data processing, Scala as the programming language, and integrate with data sources like Kafka for streaming. The course also includes working with databases such as Postgres via JDBC, and utilizes Akka Streams and Cassandra for data handling and processing.
What topics are not covered in this course?
The course does not cover basic programming or introductory concepts of Scala or Apache Spark, as it assumes prior knowledge. It also does not include detailed discussions on general machine learning or data science topics outside of the specific context of streaming and real-time data processing.
What is the expected time commitment to complete this course?
Although the total runtime of the course is unspecified, students should expect to dedicate time to not only watching the video lessons but also completing practical exercises and projects. Given the course's depth, a few hours per week over several weeks would likely be necessary to assimilate the material and complete the hands-on components.