Dive into a comprehensive project on real-time data processing with our "Streaming with Kafka & Spark" course. Learn to handle data from an online store, including client invoices, and visualize this information through a convenient interface. This course integrates technologies like FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit. Familiarity with these tools is essential, and we recommend foundational courses on Docker, as it plays a significant role in the project environment.
Course Highlights
- Introduction to the Project
- Grasp the architecture of an end-to-end data pipeline and understand the steps to build a project, leveraging appropriate technologies at each stage.
- Data Preparation
- Load and transform a dataset sourced from Kaggle, initially saving it as CSV and then converting the data to JSON for further processing.
- API Development with FastAPI
- Learn to design a basic API, build it with FastAPI, configure it for data reception, and test functionality using Postman.
- Implementing Apache Kafka and API as Docker Services
- Install Apache Kafka using Docker, configure topics, create an API to send data to Kafka, and deploy it within a Docker container.
- Streaming Data via Spark into Kafka
- Set up an Apache Spark container, connect it to Kafka and the API, utilize Spark Structured Streaming for data processing, and validate the pipeline.
- Data Storage with MongoDB
- Configure MongoDB and Mongo-Express with Docker, prepare a data storage structure, and integrate Spark with MongoDB for seamless data flow.
- Transferring Data from Kafka to MongoDB
- Master Spark Structured Streaming to efficiently write streaming data from Kafka into MongoDB using nested JSON documents.
- Building an API Client in Python
- Create a Python script to send data to the API in JSON format, ensuring successful recording in MongoDB.
- Creating a Visualization Interface with Streamlit
- Develop an interactive dashboard using Streamlit to review customer invoices and product details.
This course offers a valuable opportunity to consolidate your understanding of APIs, data streaming, Docker, and database management into a unified project. Experience the creation of streaming applications in realistic situations and enhance your practical skills.