Data Platform & Pipeline Design

1h 59m 5s
English
Paid

Course description

Data pipelines are a key element of any Data Science platform. Without them, neither data loading nor the running of machine learning models would be possible. This practical course lasting 170 minutes will teach you how to create streaming, batch, and ML pipelines using proven templates and examples for popular cloud platforms.

Read more about the course

Basic Module

Fundamentals of Platforms and Pipelines

You will get acquainted with platform architectures and different types of pipelines. You will learn how they differ, how they work, what a machine learning pipeline looks like, and how to integrate them within a single system.

Platform Architecture and End-to-End Pipeline

You will understand the structure of a typical platform architecture: connection, buffering, processing, storage, and data visualization. By examining an end-to-end pipeline, you will learn how to apply this structure in your work.

Push and Pull Pipelines

You will understand the difference between the push and pull model of data transmission—sending versus fetching. Includes illustrative examples and diagrams.

Batch and Streaming Pipelines

This is one of the most important blocks for a data engineer. You will learn to distinguish and apply batch and streaming processing depending on the scenario.

Data Streams Visualization

You will understand how to visualize data processing and storage—even if you don't have direct access to them. An example with Apache Spark will help reinforce the material.

Lambda Architecture

You will learn how batch and stream pipelines are integrated within a single platform—especially important for ML, where training is done on batch data and application through streaming.

Platform Examples

You will study architecture templates on AWS, GCP, Azure, and Hadoop, where you will see how tools like Lambda, API Gateway, and DynamoDB fit into the real infrastructure.

Advanced Module

Processing Models: Event-Driven, Batch, and Stream

You will understand the differences between event-driven, batch, micro-batching, and streaming. Learn how to choose the appropriate processing type for tasks: analytics, transactions, reverse ETL, and more.

Targeted Design and Platform Schema Replication

You will revisit the platform schema and learn to align business goals and data types with architectural solutions. Instead of choosing tools "by feel," you will learn to design the system from the task.

Modern Architectures: Lakehouse and Medallion

You will learn how Lakehouse combines file storage and transactional tables, and how bronze-silver-gold layers in the Medallion architecture help maintain order and scalability.

Machine Learning and Generative AI (GenAI)

You will learn how machine learning pipelines integrate into the platform: where training, inference, and deployment occur. Get acquainted with the concepts of semantic search and Retrieval-Augmented Generation (RAG)—the foundation of modern AI applications.

Platform Testing

A brief but important module: testing strategies for pipelines at all stages—from loading and processing to data transformation.

This course will give you a comprehensive understanding of platforms and pipelines and will teach you how to build efficient architecture applicable in real cloud solutions. It is ideal for both beginner engineers and those who want to advance to the next level.

Watch Online

Join premium to watch
Go to premium
# Title Duration
1 Introduction & Contents 03:14
2 The Platform Blueprint 10:12
3 Data Engineering Tools Guide 02:45
4 End to End Pipeline Example 06:19
5 Push Ingestion Pipelines 03:43
6 Pull Ingestion Pipelines 03:35
7 Batch Pipelines 03:08
8 Streaming Pipelines 03:35
9 Stream Analytics 02:27
10 Lambda Architecture 04:03
11 Visualization Pipelines 03:48
12 Visualization with Hive & Spark on Hadoop 06:22
13 Visualization Data via Spark Thrift Server 03:28
14 Part 2 introduction 01:17
15 Core Use Cases in Platform Design: Transactions, Analytics, and Reverse ETL 02:58
16 Blueprint Recap: Mapping Tools Across the Modern Data Platform 03:32
17 Demystifying Event-Driven, Batch, and Streaming Workflows in Data Platforms 08:11
18 Micro-Batching vs. Streaming: What’s the Real Difference? 04:56
19 Connecting Sources to Goals: Batch and Stream Processing in a Data Platform 06:29
20 Building Blocks of a Modern Data Platform: Components, Storage, and Processing 03:10
21 Before the Tech: How Data and Goals Shape Your Data Platform 10:10
22 Lakehouse Architecture Explained: From Raw Files to Transactional Tables 03:35
23 How Machine Learning Fits into Data Platforms: Training, Inference, and Deployment 06:24
24 From Embeddings to Answers: Understanding Semantic Search and Retrieval-Augmented Generation 06:07
25 Testing in the Modern Data Platform: From Ingestion to Transformation 03:11
26 Understanding the Medallion Architecture: Bronze, Silver, and Gold Layers in Data Warehousing 02:26

Books

Read Book Data Platform & Pipeline Design

#Title
1Hadoop Course Contents
2GCP Course Contents.key
3Platform & Pipeline Design questions
4Tools Guide Academy

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Machine Learning with Python : COMPLETE COURSE FOR BEGINNERS

Machine Learning with Python : COMPLETE COURSE FOR BEGINNERS

Sources: udemy
Machine Learning and artificial intelligence (AI) is everywhere; if you want to know how companies like Google, Amazon, and even Udemy extract meaning and insig
13 hours 12 minutes 31 seconds
The Data Bootcamp: Transform your Data using dbt™

The Data Bootcamp: Transform your Data using dbt™

Sources: udemy
Are you looking for a cutting-edge way to extract load and transform your data? Do you want to know more about dbt™ and how to use it? Well, this is the course
4 hours 10 minutes 51 seconds
Storing & Visualizing Time Series Data

Storing & Visualizing Time Series Data

Sources: Andreas Kretz
Processing, storing, and visualizing time series data is becoming an increasingly important task. From IoT data and system logs to statistics...
2 hours 11 minutes 34 seconds
The Data Engineering Bootcamp: Zero to Mastery

The Data Engineering Bootcamp: Zero to Mastery

Sources: zerotomastery.io
Learn to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into...
13 hours 23 minutes 15 seconds
Snowflake for Data Engineers

Snowflake for Data Engineers

Sources: Andreas Kretz
Snowflake is a next-generation cloud data warehouse that everyone is talking about today. The platform operates 100% in the cloud, providing flexible access...
2 hours 4 minutes 8 seconds