Data pipelines are a key element of any Data Science platform. Without them, neither data loading nor the running of machine learning models would be possible. This practical course lasting 170 minutes will teach you how to create streaming, batch, and ML pipelines using proven templates and examples for popular cloud platforms.
Basic Module
Fundamentals of Platforms and Pipelines
Acquaint yourself with platform architectures and different types of pipelines. Learn how they differ, how they work, what a machine learning pipeline looks like, and how to integrate them within a single system.
Platform Architecture and End-to-End Pipeline
Understand the structure of a typical platform architecture: connection, buffering, processing, storage, and data visualization. By examining an end-to-end pipeline, you'll learn how to apply this structure in your work.
Push and Pull Pipelines
Understand the difference between the push and pull models of data transmission—sending versus fetching—complete with illustrative examples and diagrams.
Batch and Streaming Pipelines
Distinguish and apply batch and streaming processing depending on the scenario. This is one of the most important blocks for a data engineer.
Data Streams Visualization
Learn how to visualize data processing and storage—even without direct access to them. An example with Apache Spark will help reinforce the material.
Lambda Architecture
Discover how batch and stream pipelines are integrated within a single platform—particularly important for ML, where training is done on batch data and application through streaming.
Platform Examples
Study architecture templates on AWS, GCP, Azure, and Hadoop. See how tools like Lambda, API Gateway, and DynamoDB fit into the real infrastructure.
Advanced Module
Processing Models: Event-Driven, Batch, and Stream
Understand the differences between event-driven, batch, micro-batching, and streaming. Learn how to choose the appropriate processing type for tasks like analytics, transactions, reverse ETL, and more.
Targeted Design and Platform Schema Replication
Revisit platform schema design principles and learn to align business goals and data types with architectural solutions. Move beyond selecting tools "by feel" and design the system from the task.
Modern Architectures: Lakehouse and Medallion
Learn how Lakehouse combines file storage and transactional tables and how bronze-silver-gold layers in Medallion architecture help maintain order and scalability.
Machine Learning and Generative AI (GenAI)
Explore how machine learning pipelines integrate into platforms: where training, inference, and deployment occur. Gain insights into semantic search and Retrieval-Augmented Generation (RAG) – the foundation of modern AI applications.
Platform Testing
Focus on testing strategies for pipelines at all stages—from loading and processing to data transformation—in this brief but important module.
This course will provide you a comprehensive understanding of platforms and pipelines and teach you how to build efficient architecture applicable in real cloud solutions. It is ideal for beginner engineers and those looking to advance to the next level.