Skip to main content

Data Platform & Pipeline Design

1h 59m 5s
English
Paid

Data pipelines are a key element of any Data Science platform. Without them, neither data loading nor the running of machine learning models would be possible. This practical course lasting 170 minutes will teach you how to create streaming, batch, and ML pipelines using proven templates and examples for popular cloud platforms.

Basic Module

Fundamentals of Platforms and Pipelines

Acquaint yourself with platform architectures and different types of pipelines. Learn how they differ, how they work, what a machine learning pipeline looks like, and how to integrate them within a single system.

Platform Architecture and End-to-End Pipeline

Understand the structure of a typical platform architecture: connection, buffering, processing, storage, and data visualization. By examining an end-to-end pipeline, you'll learn how to apply this structure in your work.

Push and Pull Pipelines

Understand the difference between the push and pull models of data transmission—sending versus fetching—complete with illustrative examples and diagrams.

Batch and Streaming Pipelines

Distinguish and apply batch and streaming processing depending on the scenario. This is one of the most important blocks for a data engineer.

Data Streams Visualization

Learn how to visualize data processing and storage—even without direct access to them. An example with Apache Spark will help reinforce the material.

Lambda Architecture

Discover how batch and stream pipelines are integrated within a single platform—particularly important for ML, where training is done on batch data and application through streaming.

Platform Examples

Study architecture templates on AWS, GCP, Azure, and Hadoop. See how tools like Lambda, API Gateway, and DynamoDB fit into the real infrastructure.

Advanced Module

Processing Models: Event-Driven, Batch, and Stream

Understand the differences between event-driven, batch, micro-batching, and streaming. Learn how to choose the appropriate processing type for tasks like analytics, transactions, reverse ETL, and more.

Targeted Design and Platform Schema Replication

Revisit platform schema design principles and learn to align business goals and data types with architectural solutions. Move beyond selecting tools "by feel" and design the system from the task.

Modern Architectures: Lakehouse and Medallion

Learn how Lakehouse combines file storage and transactional tables and how bronze-silver-gold layers in Medallion architecture help maintain order and scalability.

Machine Learning and Generative AI (GenAI)

Explore how machine learning pipelines integrate into platforms: where training, inference, and deployment occur. Gain insights into semantic search and Retrieval-Augmented Generation (RAG) – the foundation of modern AI applications.

Platform Testing

Focus on testing strategies for pipelines at all stages—from loading and processing to data transformation—in this brief but important module.

This course will provide you a comprehensive understanding of platforms and pipelines and teach you how to build efficient architecture applicable in real cloud solutions. It is ideal for beginner engineers and those looking to advance to the next level.

About the Author: Andreas Kretz

Andreas Kretz thumbnail

I am a senior data engineer and trainer, a tech enthusiast, and a father. For more than ten years, I have been passionate about Data Engineering. Initially, I became a self-taught data engineer and then led a team of data engineers at a large company. When I realized the great demand for education in this field, I followed my passion and founded my own Data Engineering Academy. Since then, I have helped over 2,000 students achieve their goals.

Watch Online 26 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 26 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction & Contents
All Course Lessons (26)
#Lesson TitleDurationAccess
1
Introduction & Contents Demo
03:14
2
The Platform Blueprint
10:12
3
Data Engineering Tools Guide
02:45
4
End to End Pipeline Example
06:19
5
Push Ingestion Pipelines
03:43
6
Pull Ingestion Pipelines
03:35
7
Batch Pipelines
03:08
8
Streaming Pipelines
03:35
9
Stream Analytics
02:27
10
Lambda Architecture
04:03
11
Visualization Pipelines
03:48
12
Visualization with Hive & Spark on Hadoop
06:22
13
Visualization Data via Spark Thrift Server
03:28
14
Part 2 introduction
01:17
15
Core Use Cases in Platform Design: Transactions, Analytics, and Reverse ETL
02:58
16
Blueprint Recap: Mapping Tools Across the Modern Data Platform
03:32
17
Demystifying Event-Driven, Batch, and Streaming Workflows in Data Platforms
08:11
18
Micro-Batching vs. Streaming: What’s the Real Difference?
04:56
19
Connecting Sources to Goals: Batch and Stream Processing in a Data Platform
06:29
20
Building Blocks of a Modern Data Platform: Components, Storage, and Processing
03:10
21
Before the Tech: How Data and Goals Shape Your Data Platform
10:10
22
Lakehouse Architecture Explained: From Raw Files to Transactional Tables
03:35
23
How Machine Learning Fits into Data Platforms: Training, Inference, and Deployment
06:24
24
From Embeddings to Answers: Understanding Semantic Search and Retrieval-Augmented Generation
06:07
25
Testing in the Modern Data Platform: From Ingestion to Transformation
03:11
26
Understanding the Medallion Architecture: Bronze, Silver, and Gold Layers in Data Warehousing
02:26
Unlock unlimited learning

Get instant access to all 25 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book Data Platform & Pipeline Design

#Title
1Hadoop Course Contents
2GCP Course Contents.key
3Platform & Pipeline Design questions
4Tools Guide Academy