Data Platform & Pipeline Design

Name: Data Platform & Pipeline Design
Price: 12.75 USD
Availability: InStock
Rating: 5 (1 reviews)

1h 59m 5s

English

Paid

June 20, 2025

Data Platform & Pipeline Design is a 26-lesson 1 hour 59 minutes self-paced course by Andreas Kretz. Data pipelines are a key element of any Data Science platform.

Course facts

Lessons: 26
Duration: 1 hour 59 minutes
Level: All levels
Language: English
Updated
Instructor: Andreas Kretz
Price: Premium

Data pipelines are a key element of any Data Science platform. Without them, neither data loading nor the running of machine learning models would be possible. This practical course lasting 170 minutes will teach you how to create streaming, batch, and ML pipelines using proven templates and examples for popular cloud platforms.

Basic Module

Fundamentals of Platforms and Pipelines

Acquaint yourself with platform architectures and different types of pipelines. Learn how they differ, how they work, what a machine learning pipeline looks like, and how to integrate them within a single system.

Platform Architecture and End-to-End Pipeline

Understand the structure of a typical platform architecture: connection, buffering, processing, storage, and data visualization. By examining an end-to-end pipeline, you'll learn how to apply this structure in your work.

Push and Pull Pipelines

Understand the difference between the push and pull models of data transmission—sending versus fetching—complete with illustrative examples and diagrams.

Batch and Streaming Pipelines

Distinguish and apply batch and streaming processing depending on the scenario. This is one of the most important blocks for a data engineer.

Data Streams Visualization

Learn how to visualize data processing and storage—even without direct access to them. An example with Apache Spark will help reinforce the material.

Lambda Architecture

Discover how batch and stream pipelines are integrated within a single platform—particularly important for ML, where training is done on batch data and application through streaming.

Platform Examples

Study architecture templates on AWS, GCP, Azure, and Hadoop. See how tools like Lambda, API Gateway, and DynamoDB fit into the real infrastructure.

Advanced Module

Processing Models: Event-Driven, Batch, and Stream

Understand the differences between event-driven, batch, micro-batching, and streaming. Learn how to choose the appropriate processing type for tasks like analytics, transactions, reverse ETL, and more.

Targeted Design and Platform Schema Replication

Revisit platform schema design principles and learn to align business goals and data types with architectural solutions. Move beyond selecting tools "by feel" and design the system from the task.

Modern Architectures: Lakehouse and Medallion

Learn how Lakehouse combines file storage and transactional tables and how bronze-silver-gold layers in Medallion architecture help maintain order and scalability.

Machine Learning and Generative AI (GenAI)

Explore how machine learning pipelines integrate into platforms: where training, inference, and deployment occur. Gain insights into semantic search and Retrieval-Augmented Generation (RAG) – the foundation of modern AI applications.

Platform Testing

Focus on testing strategies for pipelines at all stages—from loading and processing to data transformation—in this brief but important module.

This course will provide you a comprehensive understanding of platforms and pipelines and teach you how to build efficient architecture applicable in real cloud solutions. It is ideal for beginner engineers and those looking to advance to the next level.

Who teaches Data Platform & Pipeline Design? Andreas Kretz

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

What lessons are included in Data Platform & Pipeline Design?

0:00

#1: Introduction & Contents

All Course Lessons (26)

#	Lesson Title	Duration
1	Introduction & Contents Demo	03:14
2	The Platform Blueprint	10:12
3	Data Engineering Tools Guide	02:45
4	End to End Pipeline Example	06:19
5	Push Ingestion Pipelines	03:43
6	Pull Ingestion Pipelines	03:35
7	Batch Pipelines	03:08
8	Streaming Pipelines	03:35
9	Stream Analytics	02:27
10	Lambda Architecture	04:03
11	Visualization Pipelines	03:48
12	Visualization with Hive & Spark on Hadoop	06:22
13	Visualization Data via Spark Thrift Server	03:28
14	Part 2 introduction	01:17
15	Core Use Cases in Platform Design: Transactions, Analytics, and Reverse ETL	02:58
16	Blueprint Recap: Mapping Tools Across the Modern Data Platform	03:32
17	Demystifying Event-Driven, Batch, and Streaming Workflows in Data Platforms	08:11
18	Micro-Batching vs. Streaming: WhatвЂ™s the Real Difference?	04:56
19	Connecting Sources to Goals: Batch and Stream Processing in a Data Platform	06:29
20	Building Blocks of a Modern Data Platform: Components, Storage, and Processing	03:10
21	Before the Tech: How Data and Goals Shape Your Data Platform	10:10
22	Lakehouse Architecture Explained: From Raw Files to Transactional Tables	03:35
23	How Machine Learning Fits into Data Platforms: Training, Inference, and Deployment	06:24
24	From Embeddings to Answers: Understanding Semantic Search and Retrieval-Augmented Generation	06:07
25	Testing in the Modern Data Platform: From Ingestion to Transformation	03:11
26	Understanding the Medallion Architecture: Bronze, Silver, and Gold Layers in Data Warehousing	02:26

Unlock unlimited learning

Get instant access to all 25 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book Data Platform & Pipeline Design

#	Title	Type
1	Hadoop Course Contents	PDF
2	GCP Course Contents.key	PDF
3	Platform & Pipeline Design questions	PDF
4	Tools Guide Academy	PDF

What courses are similar to Data Platform & Pipeline Design?

Updated 1y ago
Case Study in A/B Testing
By: LunarTech
Unlock the Potential of A/B Testing - Dive into our course designed to equip you with the proven methodologies of designing, conducting.
1h 56m
Updated 2y ago
Machine Learning A-Z : Become Kaggle Master
By: Udemy
Want to become a good Data Scientist? Then this is a right course for you. This course has been designed by IIT professionals who have mastered in.
36h 23m5/5
Updated 9mo ago
Semantic Log Indexing & Search
By: Andreas Kretz
Master semantic search with our course on generative AI. Learn to build a complete pipeline using FastAPI, qdrant, and Streamlit for advanced data processing
53m
FreeUpdated 2mo ago
Introduction to Regression Analysis
By: Zero To Mastery
Learn core regression models and use them in Python. You study linear, logistic, log, and Cox models with clear steps and real data.
6h 20m
Updated 1y ago
Python for Data Engineers
By: Andreas Kretz
If you want to take your skills in Data Engineering to the next level, you are in the right place.
2h 21m5/5
Updated 2y ago
TensorFlow Developer Certificate in 2023: Zero to Mastery
By: Zero To Mastery
Learn TensorFlow. Pass the TensorFlow Developer Certificate Exam. Get Hired as a TensorFlow developer. This course will take you from a TensorFlow beginner to b
62h 43m
Updated 1y ago
Schema Design Data Stores
By: Andreas Kretz
Schema design is a vital topic in data management, repeatedly highlighted during coaching sessions .
2h 30m5/5
Updated 9mo ago
Fundamentals of Apache Airflow
By: Zero To Mastery
Enhance your data orchestration skills with Apache Airflow. Covering architecture basics to advanced techniques, this course helps build reliable data workflows
2h 21m

More courses by Andreas Kretz

Updated 1y ago
Docker Fundamentals
Docker is one of the most popular open-source platforms that every data engineer should know. It is a modern and lightweight alternative to virtual machines.
1h 17m
Updated 1y ago
Introduction to Data Engineering 2025
Welcome to your comprehensive introduction to Data Engineering, a foundational course designed to enhance your understanding of this pivotal field and the.
44m5/5
Updated 1y ago
Becoming a Better Data Engineer
Data engineering is not just about moving information from one place to another.
1h 46m5/5
Updated 1y ago
Computer Science Fundamentals
As in any field, strong fundamental knowledge forms the foundation for everything else. That is why this course is your first step on the path to a profession..
1h 30m
Updated 1y ago
Python for Data Engineers
If you want to take your skills in Data Engineering to the next level, you are in the right place.
2h 21m5/5
Updated 1y ago
Successful Job Application
In today's competitive job market , it is extremely important to have the skills and knowledge that will help you stand out from the crowd and secure the.
3h 20m5/5

Frequently asked questions

What prerequisites should I have before enrolling in this course?

Before enrolling in this course, you should have a basic understanding of data science and familiarity with cloud platforms. Knowledge of data engineering concepts will be beneficial, as the course delves into complex topics such as platform architectures, batch and streaming pipelines, and lambda architecture.

What types of pipelines will I learn to build in this course?

The course covers the creation of various types of data pipelines, including streaming, batch, and machine learning pipelines. You'll learn how to distinguish and apply batch and streaming processing, understand push and pull models of data transmission, and integrate these within a single platform using lambda architecture.

Who is the target audience for this course?

This course is designed for data engineers and data scientists who aim to enhance their skills in building robust data pipelines. It is also suitable for IT professionals looking to understand the integration of machine learning within data platforms.

How does the depth of this course compare to other similar courses?

The course offers a practical approach to data pipeline design, with templates and examples for popular cloud platforms. It goes beyond basic theory by providing detailed insights into platform architecture, data visualization using Apache Spark, and integrating batch and streaming pipelines within lambda architecture.

What specific tools or platforms are covered in the course?

The course includes examples and exercises using Apache Spark and Hive, particularly for data visualization tasks. These tools are illustrated in the context of Hadoop and Spark Thrift Server to help visualize data processing and storage.

What topics are not covered in this course?

The course does not cover introductory data science concepts or specific programming languages. It focuses on data pipeline and platform architecture rather than detailed machine learning algorithms or statistical analysis techniques within data science.

What is the estimated time commitment for completing the course?

The course has a total runtime of 170 minutes, which translates to just under three hours. This does not include additional time you may spend on exercises or reviewing the material, so plan accordingly to ensure a comprehensive understanding of the content.