Skip to main content
CF

Streaming with Kafka & Spark

2h 46m 25s
English
Paid

Streaming with Kafka & Spark is a 40-lesson 2 hours 46 minutes self-paced course by Andreas Kretz. Dive into a comprehensive project on real-time data processing with our "Streaming with Kafka & Spark" course.

Course facts

Lessons
40
Duration
2 hours 46 minutes
Level
All levels
Language
English
Updated
Instructor
Andreas Kretz
Price
Premium

Dive into a comprehensive project on real-time data processing with our "Streaming with Kafka & Spark" course. Learn to handle data from an online store, including client invoices, and visualize this information through a convenient interface. This course integrates technologies like FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit. Familiarity with these tools is essential, and we recommend foundational courses on Docker, as it plays a significant role in the project environment.

Course Highlights

  • Introduction to the Project
    • Grasp the architecture of an end-to-end data pipeline and understand the steps to build a project, leveraging appropriate technologies at each stage.
  • Data Preparation
    • Load and transform a dataset sourced from Kaggle, initially saving it as CSV and then converting the data to JSON for further processing.
  • API Development with FastAPI
    • Learn to design a basic API, build it with FastAPI, configure it for data reception, and test functionality using Postman.
  • Implementing Apache Kafka and API as Docker Services
    • Install Apache Kafka using Docker, configure topics, create an API to send data to Kafka, and deploy it within a Docker container.
  • Streaming Data via Spark into Kafka
    • Set up an Apache Spark container, connect it to Kafka and the API, utilize Spark Structured Streaming for data processing, and validate the pipeline.
  • Data Storage with MongoDB
    • Configure MongoDB and Mongo-Express with Docker, prepare a data storage structure, and integrate Spark with MongoDB for seamless data flow.
  • Transferring Data from Kafka to MongoDB
    • Master Spark Structured Streaming to efficiently write streaming data from Kafka into MongoDB using nested JSON documents.
  • Building an API Client in Python
    • Create a Python script to send data to the API in JSON format, ensuring successful recording in MongoDB.
  • Creating a Visualization Interface with Streamlit
    • Develop an interactive dashboard using Streamlit to review customer invoices and product details.

This course offers a valuable opportunity to consolidate your understanding of APIs, data streaming, Docker, and database management into a unified project. Experience the creation of streaming applications in realistic situations and enhance your practical skills.

Additional

https://github.com/team-data-science/document-streaming

Who teaches Streaming with Kafka & Spark? Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

What lessons are included in Streaming with Kafka & Spark?

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 40 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (40)
#Lesson TitleDurationAccess
1
Introduction Demo
01:13
2
Project overview
05:34
3
Docker Fundamentals
01:44
4
The Dataset we use
02:49
5
Transform CSV to JSONs
10:52
6
API Schema
03:43
7
Creating the API with FastAPI
09:42
8
Testing the API with Postman
06:11
9
Apache Kafka Goals
02:34
10
Kafka Docker Compose Explained
03:36
11
Startup Kafka Compose File
02:47
12
Kafka Topics Setup
07:12
13
Preparing the API Docker build
04:14
14
Build the API
03:26
15
Deploy the API
02:49
16
Test the API Container with Kafka
02:07
17
Recap API & Kafka
01:38
18
Apache Spark Compose Config
04:39
19
Startup Spark with Kafka & API
02:27
20
Spark Ingest Kafka & Produce Kafka
06:35
21
Setup Test configuration
03:02
22
Test Spark Streaming Kafka
05:43
23
Spark UI Monitoring
02:31
24
MongoDB Goals
04:23
25
MongoDB Docker Compose Conifg
03:59
26
MongoDB Startup
02:45
27
Prepare MongoDB Database & Collection
01:46
28
Spark Code Streaming To MongoDB
06:32
29
Transformations 1: Writing Kafka Message as String to MongoDB
03:26
30
Transformations 2: Writing complete Kafka message to MongoDB
02:35
31
Transformations 3: Writing Nested Document to MongoDB
04:29
32
Transformations 4: Writing Messages as Document
02:14
33
Spark Streaming Conclusion
02:53
34
Writing the API Client
04:05
35
Create Test Data & Run Client
05:37
36
Streamlit Intro & Goals
06:19
37
Query Customer Invoices
04:08
38
Query Invoice Documents
04:16
39
Project Summary
03:26
40
Outlook
06:24
Unlock unlimited learning

Get instant access to all 39 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

What courses are similar to Streaming with Kafka & Spark?

Frequently asked questions

What prior knowledge do I need before taking this course?
Before enrolling in this course, you should have a foundational understanding of Docker, as it is integral to the project environment. Additionally, familiarity with FastAPI, Apache Kafka, Apache Spark, and MongoDB is essential, as these technologies are central to the course content and project implementation.
What project will I work on during this course?
The course involves building an end-to-end data processing project for an online store. You will handle real-time data processing, including client invoices, and visualize the data using a user-friendly interface. Key technologies such as FastAPI for API development, Apache Kafka for data streaming, Apache Spark for processing, and MongoDB for storage are used throughout the project.
Who is the target audience for this course?
This course is designed for individuals interested in real-time data processing and visualization, particularly those who have a foundational understanding of the involved technologies. It's suitable for data engineers, software developers, and IT professionals looking to enhance their skills in building and deploying data pipelines using modern tools.
How does this course compare in depth and scope to similar courses?
This course offers a focused and practical approach to real-time data processing with a complete project utilizing Apache Kafka and Spark, unlike some introductory courses. It covers practical implementation details, such as Dockerizing services, setting up Apache Kafka and Spark, and streaming data into a MongoDB database, providing a thorough understanding of the entire data pipeline.
What specific tools and platforms will I learn to use?
You'll learn to use several key tools and platforms: FastAPI for API development, Apache Kafka for real-time data streaming, Apache Spark for data processing, MongoDB for data storage, and Streamlit for data visualization. Additionally, Docker is used extensively to manage the deployment of these services.
What topics are not covered in this course?
While the course covers real-time data processing and related technologies, it does not delve into the basics of each technology, such as introductory courses on Docker, FastAPI, Apache Kafka, or Spark. It also does not cover data science methodologies or detailed data analytics beyond the scope of streaming and visualization.
How much time should I expect to commit to this course?
The course comprises 40 lessons which include hands-on exercises and projects. While the exact runtime is not specified, students should be prepared to invest significant time in both learning and practical application of the concepts. Completing the course will require a commitment to understanding and applying complex real-time data processing techniques.