Skip to main content

Streaming with Kafka & Spark

2h 46m 25s
English
Paid

Course description

This course is a full-fledged project with a complete cycle of real-time data processing. You will work with data from an online store, including client invoices and goods from these invoices. The aim of the course is to set up streaming data processing (invoices) as they arrive and visualize them in a convenient interface.

You will use technologies such as FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit - tools you are already familiar with from other courses. We strongly recommend completing basic courses on these technologies, as well as a course on Docker fundamentals, since the project heavily relies on a Docker environment.

Read more about the course

What you can expect in the course:

  • Introduction to the Project
    • You will understand the architecture of an end-to-end pipeline and see how data visualization is built. Step by step, you will learn how to build a project and at what stage to use certain technologies.
  • Data Preparation
    • You will load and transform a dataset from Kaggle: first saving it in CSV format, then converting the data into JSON for further work.
  • API with FastAPI
    • You will get acquainted with the general API scheme, create an API with FastAPI, set it up to receive data, and test its operation through Postman.
  • Apache Kafka and API as Docker Services
    • You will install Apache Kafka through Docker, set up topics, write an API that will send data to Kafka, and deploy it in a Docker container.
  • Data Streaming through Spark into Kafka
    • You will prepare a container with Apache Spark, connect it to Kafka and the API, set up data processing through Spark Structured Streaming, and test the pipeline.
  • Data Storage in MongoDB
    • You will set up MongoDB and Mongo-Express through Docker, prepare a database and collection for storing data, and link Spark with MongoDB.
  • Data Streaming from Kafka to MongoDB
    • You will learn how to work with Spark Structured Streaming to write streaming data from Kafka into MongoDB in the format of nested JSON documents.
  • API Client in Python
    • You will write a Python client script to send data to the API in JSON format and ensure that the data is successfully recorded in MongoDB.
  • Visualization Interface in Streamlit
    • You will build an interactive dashboard to view customer invoices and products using Streamlit.

This project is a great opportunity to combine knowledge of API, data streaming processing, working with Docker, and databases into a cohesive whole, gaining practical experience in creating streaming applications in real-world conditions.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 40 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction

All Course Lessons (40)

#Lesson TitleDurationAccess
1
Introduction Demo
01:13
2
Project overview
05:34
3
Docker Fundamentals
01:44
4
The Dataset we use
02:49
5
Transform CSV to JSONs
10:52
6
API Schema
03:43
7
Creating the API with FastAPI
09:42
8
Testing the API with Postman
06:11
9
Apache Kafka Goals
02:34
10
Kafka Docker Compose Explained
03:36
11
Startup Kafka Compose File
02:47
12
Kafka Topics Setup
07:12
13
Preparing the API Docker build
04:14
14
Build the API
03:26
15
Deploy the API
02:49
16
Test the API Container with Kafka
02:07
17
Recap API & Kafka
01:38
18
Apache Spark Compose Config
04:39
19
Startup Spark with Kafka & API
02:27
20
Spark Ingest Kafka & Produce Kafka
06:35
21
Setup Test configuration
03:02
22
Test Spark Streaming Kafka
05:43
23
Spark UI Monitoring
02:31
24
MongoDB Goals
04:23
25
MongoDB Docker Compose Conifg
03:59
26
MongoDB Startup
02:45
27
Prepare MongoDB Database & Collection
01:46
28
Spark Code Streaming To MongoDB
06:32
29
Transformations 1: Writing Kafka Message as String to MongoDB
03:26
30
Transformations 2: Writing complete Kafka message to MongoDB
02:35
31
Transformations 3: Writing Nested Document to MongoDB
04:29
32
Transformations 4: Writing Messages as Document
02:14
33
Spark Streaming Conclusion
02:53
34
Writing the API Client
04:05
35
Create Test Data & Run Client
05:37
36
Streamlit Intro & Goals
06:19
37
Query Customer Invoices
04:08
38
Query Invoice Documents
04:16
39
Project Summary
03:26
40
Outlook
06:24

Unlock unlimited learning

Get instant access to all 39 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Apache Iceberg Fundamentals

Apache Iceberg Fundamentals

Sources: David Reger
Modern data platforms need the flexibility of data lakes and the reliability of warehouses. Apache Iceberg combines both approaches. In this course, you will...
33 minutes 32 seconds
Time Series Analysis, Forecasting, and Machine Learning

Time Series Analysis, Forecasting, and Machine Learning

Sources: udemy
Let me cut to the chase. This is not your average Time Series Analysis course. This course covers modern developments such as deep learning, time series classif
22 hours 47 minutes 45 seconds
Fundamentals of Apache Airflow

Fundamentals of Apache Airflow

Sources: zerotomastery.io
This practical course starts with the basics and step by step guides you to building real orchestration scenarios - from task retry executions to...
2 hours 21 minutes 18 seconds
Dimensional Data Modeling

Dimensional Data Modeling

Sources: Eka Ponkratova
In today's world, where data plays a key role, effective organization of information is the foundation for quality analytics and report building.
1 hour 37 minutes 57 seconds
Machine Learning with Python : COMPLETE COURSE FOR BEGINNERS

Machine Learning with Python : COMPLETE COURSE FOR BEGINNERS

Sources: udemy
Machine Learning and artificial intelligence (AI) is everywhere; if you want to know how companies like Google, Amazon, and even Udemy extract meaning and insig
13 hours 12 minutes 31 seconds