Streaming with Kafka & Spark
This course is a full-fledged project with a complete cycle of real-time data processing. You will work with data from an online store, including client invoices and goods from these invoices. The aim of the course is to set up streaming data processing (invoices) as they arrive and visualize them in a convenient interface.
You will use technologies such as FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit - tools you are already familiar with from other courses. We strongly recommend completing basic courses on these technologies, as well as a course on Docker fundamentals, since the project heavily relies on a Docker environment.
Read more about the course
What you can expect in the course:
- Introduction to the Project
- You will understand the architecture of an end-to-end pipeline and see how data visualization is built. Step by step, you will learn how to build a project and at what stage to use certain technologies.
- Data Preparation
- You will load and transform a dataset from Kaggle: first saving it in CSV format, then converting the data into JSON for further work.
- API with FastAPI
- You will get acquainted with the general API scheme, create an API with FastAPI, set it up to receive data, and test its operation through Postman.
- Apache Kafka and API as Docker Services
- You will install Apache Kafka through Docker, set up topics, write an API that will send data to Kafka, and deploy it in a Docker container.
- Data Streaming through Spark into Kafka
- You will prepare a container with Apache Spark, connect it to Kafka and the API, set up data processing through Spark Structured Streaming, and test the pipeline.
- Data Storage in MongoDB
- You will set up MongoDB and Mongo-Express through Docker, prepare a database and collection for storing data, and link Spark with MongoDB.
- Data Streaming from Kafka to MongoDB
- You will learn how to work with Spark Structured Streaming to write streaming data from Kafka into MongoDB in the format of nested JSON documents.
- API Client in Python
- You will write a Python client script to send data to the API in JSON format and ensure that the data is successfully recorded in MongoDB.
- Visualization Interface in Streamlit
- You will build an interactive dashboard to view customer invoices and products using Streamlit.
This project is a great opportunity to combine knowledge of API, data streaming processing, working with Docker, and databases into a cohesive whole, gaining practical experience in creating streaming applications in real-world conditions.
Watch Online Streaming with Kafka & Spark
# | Title | Duration |
---|---|---|
1 | Introduction | 01:13 |
2 | Project overview | 05:34 |
3 | Docker Fundamentals | 01:44 |
4 | The Dataset we use | 02:49 |
5 | Transform CSV to JSONs | 10:52 |
6 | API Schema | 03:43 |
7 | Creating the API with FastAPI | 09:42 |
8 | Testing the API with Postman | 06:11 |
9 | Apache Kafka Goals | 02:34 |
10 | Kafka Docker Compose Explained | 03:36 |
11 | Startup Kafka Compose File | 02:47 |
12 | Kafka Topics Setup | 07:12 |
13 | Preparing the API Docker build | 04:14 |
14 | Build the API | 03:26 |
15 | Deploy the API | 02:49 |
16 | Test the API Container with Kafka | 02:07 |
17 | Recap API & Kafka | 01:38 |
18 | Apache Spark Compose Config | 04:39 |
19 | Startup Spark with Kafka & API | 02:27 |
20 | Spark Ingest Kafka & Produce Kafka | 06:35 |
21 | Setup Test configuration | 03:02 |
22 | Test Spark Streaming Kafka | 05:43 |
23 | Spark UI Monitoring | 02:31 |
24 | MongoDB Goals | 04:23 |
25 | MongoDB Docker Compose Conifg | 03:59 |
26 | MongoDB Startup | 02:45 |
27 | Prepare MongoDB Database & Collection | 01:46 |
28 | Spark Code Streaming To MongoDB | 06:32 |
29 | Transformations 1: Writing Kafka Message as String to MongoDB | 03:26 |
30 | Transformations 2: Writing complete Kafka message to MongoDB | 02:35 |
31 | Transformations 3: Writing Nested Document to MongoDB | 04:29 |
32 | Transformations 4: Writing Messages as Document | 02:14 |
33 | Spark Streaming Conclusion | 02:53 |
34 | Writing the API Client | 04:05 |
35 | Create Test Data & Run Client | 05:37 |
36 | Streamlit Intro & Goals | 06:19 |
37 | Query Customer Invoices | 04:08 |
38 | Query Invoice Documents | 04:16 |
39 | Project Summary | 03:26 |
40 | Outlook | 06:24 |
Similar courses to Streaming with Kafka & Spark

Time Series Analysis, Forecasting, and Machine Learningudemy

Complete Machine Learning and Data Science: Zero to Masteryudemyzerotomastery.io

Machine Learning Design Questionsalgoexpert

Introduction to Data Engineeringzerotomastery.io

Relational Data ModelingEka Ponkratova

Machine Learning in JavaScript with TensorFlow.jsudemy

Data Engineering on DatabricksAndreas Kretz

DS4B 101-P: Python for Data Science AutomationBusiness Science University

Introduction to Data Engineering 2025Andreas Kretz
