Skip to main content
CF

Streaming with Kafka & Spark

2h 46m 25s
English
Paid

Dive into a comprehensive project on real-time data processing with our "Streaming with Kafka & Spark" course. Learn to handle data from an online store, including client invoices, and visualize this information through a convenient interface. This course integrates technologies like FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit. Familiarity with these tools is essential, and we recommend foundational courses on Docker, as it plays a significant role in the project environment.

Course Highlights

  • Introduction to the Project
    • Grasp the architecture of an end-to-end data pipeline and understand the steps to build a project, leveraging appropriate technologies at each stage.
  • Data Preparation
    • Load and transform a dataset sourced from Kaggle, initially saving it as CSV and then converting the data to JSON for further processing.
  • API Development with FastAPI
    • Learn to design a basic API, build it with FastAPI, configure it for data reception, and test functionality using Postman.
  • Implementing Apache Kafka and API as Docker Services
    • Install Apache Kafka using Docker, configure topics, create an API to send data to Kafka, and deploy it within a Docker container.
  • Streaming Data via Spark into Kafka
    • Set up an Apache Spark container, connect it to Kafka and the API, utilize Spark Structured Streaming for data processing, and validate the pipeline.
  • Data Storage with MongoDB
    • Configure MongoDB and Mongo-Express with Docker, prepare a data storage structure, and integrate Spark with MongoDB for seamless data flow.
  • Transferring Data from Kafka to MongoDB
    • Master Spark Structured Streaming to efficiently write streaming data from Kafka into MongoDB using nested JSON documents.
  • Building an API Client in Python
    • Create a Python script to send data to the API in JSON format, ensuring successful recording in MongoDB.
  • Creating a Visualization Interface with Streamlit
    • Develop an interactive dashboard using Streamlit to review customer invoices and product details.

This course offers a valuable opportunity to consolidate your understanding of APIs, data streaming, Docker, and database management into a unified project. Experience the creation of streaming applications in realistic situations and enhance your practical skills.

Additional

https://github.com/team-data-science/document-streaming

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 40 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 40 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (40)
#Lesson TitleDurationAccess
1
Introduction Demo
01:13
2
Project overview
05:34
3
Docker Fundamentals
01:44
4
The Dataset we use
02:49
5
Transform CSV to JSONs
10:52
6
API Schema
03:43
7
Creating the API with FastAPI
09:42
8
Testing the API with Postman
06:11
9
Apache Kafka Goals
02:34
10
Kafka Docker Compose Explained
03:36
11
Startup Kafka Compose File
02:47
12
Kafka Topics Setup
07:12
13
Preparing the API Docker build
04:14
14
Build the API
03:26
15
Deploy the API
02:49
16
Test the API Container with Kafka
02:07
17
Recap API & Kafka
01:38
18
Apache Spark Compose Config
04:39
19
Startup Spark with Kafka & API
02:27
20
Spark Ingest Kafka & Produce Kafka
06:35
21
Setup Test configuration
03:02
22
Test Spark Streaming Kafka
05:43
23
Spark UI Monitoring
02:31
24
MongoDB Goals
04:23
25
MongoDB Docker Compose Conifg
03:59
26
MongoDB Startup
02:45
27
Prepare MongoDB Database & Collection
01:46
28
Spark Code Streaming To MongoDB
06:32
29
Transformations 1: Writing Kafka Message as String to MongoDB
03:26
30
Transformations 2: Writing complete Kafka message to MongoDB
02:35
31
Transformations 3: Writing Nested Document to MongoDB
04:29
32
Transformations 4: Writing Messages as Document
02:14
33
Spark Streaming Conclusion
02:53
34
Writing the API Client
04:05
35
Create Test Data & Run Client
05:37
36
Streamlit Intro & Goals
06:19
37
Query Customer Invoices
04:08
38
Query Invoice Documents
04:16
39
Project Summary
03:26
40
Outlook
06:24
Unlock unlimited learning

Get instant access to all 39 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Course content

40 lessons · 2h 46m 25s
Show all 40 lessons
  1. 1 Introduction 01:13
  2. 2 Project overview 05:34
  3. 3 Docker Fundamentals 01:44
  4. 4 The Dataset we use 02:49
  5. 5 Transform CSV to JSONs 10:52
  6. 6 API Schema 03:43
  7. 7 Creating the API with FastAPI 09:42
  8. 8 Testing the API with Postman 06:11
  9. 9 Apache Kafka Goals 02:34
  10. 10 Kafka Docker Compose Explained 03:36
  11. 11 Startup Kafka Compose File 02:47
  12. 12 Kafka Topics Setup 07:12
  13. 13 Preparing the API Docker build 04:14
  14. 14 Build the API 03:26
  15. 15 Deploy the API 02:49
  16. 16 Test the API Container with Kafka 02:07
  17. 17 Recap API & Kafka 01:38
  18. 18 Apache Spark Compose Config 04:39
  19. 19 Startup Spark with Kafka & API 02:27
  20. 20 Spark Ingest Kafka & Produce Kafka 06:35
  21. 21 Setup Test configuration 03:02
  22. 22 Test Spark Streaming Kafka 05:43
  23. 23 Spark UI Monitoring 02:31
  24. 24 MongoDB Goals 04:23
  25. 25 MongoDB Docker Compose Conifg 03:59
  26. 26 MongoDB Startup 02:45
  27. 27 Prepare MongoDB Database & Collection 01:46
  28. 28 Spark Code Streaming To MongoDB 06:32
  29. 29 Transformations 1: Writing Kafka Message as String to MongoDB 03:26
  30. 30 Transformations 2: Writing complete Kafka message to MongoDB 02:35
  31. 31 Transformations 3: Writing Nested Document to MongoDB 04:29
  32. 32 Transformations 4: Writing Messages as Document 02:14
  33. 33 Spark Streaming Conclusion 02:53
  34. 34 Writing the API Client 04:05
  35. 35 Create Test Data & Run Client 05:37
  36. 36 Streamlit Intro & Goals 06:19
  37. 37 Query Customer Invoices 04:08
  38. 38 Query Invoice Documents 04:16
  39. 39 Project Summary 03:26
  40. 40 Outlook 06:24

Related courses

Frequently asked questions

What is Streaming with Kafka & Spark about?
Dive into a comprehensive project on real-time data processing with our "Streaming with Kafka & Spark" course. Learn to handle data from an online store, including client invoices, and visualize this information through a convenient…
Who teaches Streaming with Kafka & Spark?
Streaming with Kafka & Spark is taught by Andreas Kretz. You can find more courses by this instructor on the corresponding source page.
How long is Streaming with Kafka & Spark?
Streaming with Kafka & Spark contains 40 lessons with a total runtime of 2 hours 46 minutes. All lessons are available to watch online at your own pace.
Is Streaming with Kafka & Spark free to watch?
Streaming with Kafka & Spark is part of CourseFlix's premium catalog. A CourseFlix subscription unlocks the full video player; the course description, table of contents, and preview information are available to everyone.
Where can I watch Streaming with Kafka & Spark online?
Streaming with Kafka & Spark is available to watch online on CourseFlix at https://courseflix.net/course/streaming-with-kafka-spark. The page hosts every lesson with the integrated video player; no download is required.