Skip to main content

Streaming with Kafka & Spark

2h 46m 25s
English
Paid

Dive into a comprehensive project on real-time data processing with our "Streaming with Kafka & Spark" course. Learn to handle data from an online store, including client invoices, and visualize this information through a convenient interface. This course integrates technologies like FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit. Familiarity with these tools is essential, and we recommend foundational courses on Docker, as it plays a significant role in the project environment.

Course Highlights

  • Introduction to the Project
    • Grasp the architecture of an end-to-end data pipeline and understand the steps to build a project, leveraging appropriate technologies at each stage.
  • Data Preparation
    • Load and transform a dataset sourced from Kaggle, initially saving it as CSV and then converting the data to JSON for further processing.
  • API Development with FastAPI
    • Learn to design a basic API, build it with FastAPI, configure it for data reception, and test functionality using Postman.
  • Implementing Apache Kafka and API as Docker Services
    • Install Apache Kafka using Docker, configure topics, create an API to send data to Kafka, and deploy it within a Docker container.
  • Streaming Data via Spark into Kafka
    • Set up an Apache Spark container, connect it to Kafka and the API, utilize Spark Structured Streaming for data processing, and validate the pipeline.
  • Data Storage with MongoDB
    • Configure MongoDB and Mongo-Express with Docker, prepare a data storage structure, and integrate Spark with MongoDB for seamless data flow.
  • Transferring Data from Kafka to MongoDB
    • Master Spark Structured Streaming to efficiently write streaming data from Kafka into MongoDB using nested JSON documents.
  • Building an API Client in Python
    • Create a Python script to send data to the API in JSON format, ensuring successful recording in MongoDB.
  • Creating a Visualization Interface with Streamlit
    • Develop an interactive dashboard using Streamlit to review customer invoices and product details.

This course offers a valuable opportunity to consolidate your understanding of APIs, data streaming, Docker, and database management into a unified project. Experience the creation of streaming applications in realistic situations and enhance your practical skills.

About the Author: Andreas Kretz

Andreas Kretz thumbnail

I am a senior data engineer and trainer, a tech enthusiast, and a father. For more than ten years, I have been passionate about Data Engineering. Initially, I became a self-taught data engineer and then led a team of data engineers at a large company. When I realized the great demand for education in this field, I followed my passion and founded my own Data Engineering Academy. Since then, I have helped over 2,000 students achieve their goals.

Watch Online 40 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 40 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (40)
#Lesson TitleDurationAccess
1
Introduction Demo
01:13
2
Project overview
05:34
3
Docker Fundamentals
01:44
4
The Dataset we use
02:49
5
Transform CSV to JSONs
10:52
6
API Schema
03:43
7
Creating the API with FastAPI
09:42
8
Testing the API with Postman
06:11
9
Apache Kafka Goals
02:34
10
Kafka Docker Compose Explained
03:36
11
Startup Kafka Compose File
02:47
12
Kafka Topics Setup
07:12
13
Preparing the API Docker build
04:14
14
Build the API
03:26
15
Deploy the API
02:49
16
Test the API Container with Kafka
02:07
17
Recap API & Kafka
01:38
18
Apache Spark Compose Config
04:39
19
Startup Spark with Kafka & API
02:27
20
Spark Ingest Kafka & Produce Kafka
06:35
21
Setup Test configuration
03:02
22
Test Spark Streaming Kafka
05:43
23
Spark UI Monitoring
02:31
24
MongoDB Goals
04:23
25
MongoDB Docker Compose Conifg
03:59
26
MongoDB Startup
02:45
27
Prepare MongoDB Database & Collection
01:46
28
Spark Code Streaming To MongoDB
06:32
29
Transformations 1: Writing Kafka Message as String to MongoDB
03:26
30
Transformations 2: Writing complete Kafka message to MongoDB
02:35
31
Transformations 3: Writing Nested Document to MongoDB
04:29
32
Transformations 4: Writing Messages as Document
02:14
33
Spark Streaming Conclusion
02:53
34
Writing the API Client
04:05
35
Create Test Data & Run Client
05:37
36
Streamlit Intro & Goals
06:19
37
Query Customer Invoices
04:08
38
Query Invoice Documents
04:16
39
Project Summary
03:26
40
Outlook
06:24
Unlock unlimited learning

Get instant access to all 39 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription