Streaming with Kafka & Spark

2h 46m 25s
English
Paid

This course is a full-fledged project with a complete cycle of real-time data processing. You will work with data from an online store, including client invoices and goods from these invoices. The aim of the course is to set up streaming data processing (invoices) as they arrive and visualize them in a convenient interface.

You will use technologies such as FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit - tools you are already familiar with from other courses. We strongly recommend completing basic courses on these technologies, as well as a course on Docker fundamentals, since the project heavily relies on a Docker environment.

Read more about the course

What you can expect in the course:

  • Introduction to the Project
    • You will understand the architecture of an end-to-end pipeline and see how data visualization is built. Step by step, you will learn how to build a project and at what stage to use certain technologies.
  • Data Preparation
    • You will load and transform a dataset from Kaggle: first saving it in CSV format, then converting the data into JSON for further work.
  • API with FastAPI
    • You will get acquainted with the general API scheme, create an API with FastAPI, set it up to receive data, and test its operation through Postman.
  • Apache Kafka and API as Docker Services
    • You will install Apache Kafka through Docker, set up topics, write an API that will send data to Kafka, and deploy it in a Docker container.
  • Data Streaming through Spark into Kafka
    • You will prepare a container with Apache Spark, connect it to Kafka and the API, set up data processing through Spark Structured Streaming, and test the pipeline.
  • Data Storage in MongoDB
    • You will set up MongoDB and Mongo-Express through Docker, prepare a database and collection for storing data, and link Spark with MongoDB.
  • Data Streaming from Kafka to MongoDB
    • You will learn how to work with Spark Structured Streaming to write streaming data from Kafka into MongoDB in the format of nested JSON documents.
  • API Client in Python
    • You will write a Python client script to send data to the API in JSON format and ensure that the data is successfully recorded in MongoDB.
  • Visualization Interface in Streamlit
    • You will build an interactive dashboard to view customer invoices and products using Streamlit.

This project is a great opportunity to combine knowledge of API, data streaming processing, working with Docker, and databases into a cohesive whole, gaining practical experience in creating streaming applications in real-world conditions.

Watch Online Streaming with Kafka & Spark

Join premium to watch
Go to premium
# Title Duration
1 Introduction 01:13
2 Project overview 05:34
3 Docker Fundamentals 01:44
4 The Dataset we use 02:49
5 Transform CSV to JSONs 10:52
6 API Schema 03:43
7 Creating the API with FastAPI 09:42
8 Testing the API with Postman 06:11
9 Apache Kafka Goals 02:34
10 Kafka Docker Compose Explained 03:36
11 Startup Kafka Compose File 02:47
12 Kafka Topics Setup 07:12
13 Preparing the API Docker build 04:14
14 Build the API 03:26
15 Deploy the API 02:49
16 Test the API Container with Kafka 02:07
17 Recap API & Kafka 01:38
18 Apache Spark Compose Config 04:39
19 Startup Spark with Kafka & API 02:27
20 Spark Ingest Kafka & Produce Kafka 06:35
21 Setup Test configuration 03:02
22 Test Spark Streaming Kafka 05:43
23 Spark UI Monitoring 02:31
24 MongoDB Goals 04:23
25 MongoDB Docker Compose Conifg 03:59
26 MongoDB Startup 02:45
27 Prepare MongoDB Database & Collection 01:46
28 Spark Code Streaming To MongoDB 06:32
29 Transformations 1: Writing Kafka Message as String to MongoDB 03:26
30 Transformations 2: Writing complete Kafka message to MongoDB 02:35
31 Transformations 3: Writing Nested Document to MongoDB 04:29
32 Transformations 4: Writing Messages as Document 02:14
33 Spark Streaming Conclusion 02:53
34 Writing the API Client 04:05
35 Create Test Data & Run Client 05:37
36 Streamlit Intro & Goals 06:19
37 Query Customer Invoices 04:08
38 Query Invoice Documents 04:16
39 Project Summary 03:26
40 Outlook 06:24

Similar courses to Streaming with Kafka & Spark

The Data Engineering Bootcamp: Zero to Mastery

The Data Engineering Bootcamp: Zero to Masteryzerotomastery.io

Category: Data processing and analysis
Duration 13 hours 23 minutes 15 seconds
Dockerized ETL With AWS, TDengine & Grafana

Dockerized ETL With AWS, TDengine & GrafanaAndreas Kretz

Category: Data processing and analysis
Duration 29 minutes 12 seconds
Introduction to Data Engineering

Introduction to Data Engineeringzerotomastery.io

Category: Data processing and analysis
Duration 57 minutes 26 seconds
Case Study in A/B Testing

Case Study in A/B TestingLunarTech

Category: Data processing and analysis
Duration 1 hour 56 minutes 17 seconds
MongoDB Fundamentals

MongoDB FundamentalsAndreas Kretz

Category: MongoDB, Data processing and analysis
Duration 1 hour 23 minutes 19 seconds
Business Intelligence with Excel

Business Intelligence with Excelzerotomastery.io

Category: Data processing and analysis
Duration 7 hours 41 minutes 24 seconds
Data Engineering on Databricks

Data Engineering on DatabricksAndreas Kretz

Category: Data processing and analysis
Duration 1 hour 27 minutes 29 seconds
dbt for Data Engineers

dbt for Data EngineersAndreas Kretz

Category: Data processing and analysis
Duration 1 hour 52 minutes 55 seconds
The Data Science Course: Complete Data Science Bootcamp 2023

The Data Science Course: Complete Data Science Bootcamp 2023udemy

Category: Data processing and analysis
Duration 31 hours 14 minutes 14 seconds
Deep Learning A-Z™: Hands-On Artificial Neural Networks

Deep Learning A-Z™: Hands-On Artificial Neural Networksudemy

Category: Python, Data processing and analysis
Duration 22 hours 36 minutes 30 seconds