Skip to main content

Apache Iceberg Fundamentals

33m 32s
English
Paid

Unlock the potential of modern data platforms with Apache Iceberg, which masterfully combines the flexibility of data lakes with the reliability of data warehouses. In this course, you will delve into the workings of this powerful open table format, explore its architecture, and harness its key features such as schema evolution, "time travel," and high-performance analytics within Lakehouse systems.

Course Overview

Based on hands-on examples from real-world data engineering, this course guides you through setting up a local lab with Docker, Spark, and MinIO, along with creating and managing Iceberg tables. You'll acquire the skills needed to handle tasks ranging from data recording and metadata analysis to query optimization and partition restructuring, preparing you for confident application of Iceberg in production environments.

Learning Outcomes

By concluding this course, you will not only grasp the internal structure of Iceberg but also possess a functional environment with ready-made notebooks for your projects and a deep understanding of table operations key to Lakehouse architecture.

Why Choose Iceberg?

Iceberg solves persistent big data challenges including slow queries, complex schema changes, and storage tightly coupled with computing systems. Discover why industry giants such as Netflix, Stripe, and Apple have adopted Iceberg and learn to integrate its methodologies into your own systems.

Course Activities

What you will do:

  1. Establish a local Lakehouse lab using Iceberg with Docker Compose, Spark, REST catalog, and MinIO.
  2. Get hands-on experience by creating an Iceberg table with an engaging dataset, define schemas, write data using PySpark, and explore Iceberg's metadata management.
  3. Gain expertise in schema evolution, including adding, renaming, and modifying column types, as well as employing advanced partitioning techniques.
  4. Perform point-in-time operations, like row deletions, and utilize the "time travel" feature to analyze past data versions.
  5. Explore Iceberg's architecture, covering parquet files, manifests, snapshots, and catalogs.
  6. Use the MinIO UI to visualize physical storage of data and metadata.
  7. Execute analytical SQL queries on Iceberg tables through PySpark using common operations such as join, group by, and filter.


About the Author: David Reger

David Reger thumbnail

David Reger is a Cloud Data Engineer at MSG Systems, where he develops scalable Lakehouse platforms based on Azure, Databricks, and open-source technologies such as Apache Spark and Iceberg. His experience spans IoT, data integration, and architecture design, enabling him to combine deep theoretical knowledge with practical approaches. David is passionate about helping engineers master modern data tools and sharing knowledge gained from real projects.

Watch Online 12 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 12 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Intro
All Course Lessons (12)
#Lesson TitleDurationAccess
1
Intro Demo
01:07
2
Goals
01:03
3
Challenges
04:10
4
Iceberg & Lakehouses
01:42
5
Architecture Deep Dive
02:02
6
Iceberg Features
02:45
7
Architecture & Summary
02:51
8
Setup & Docker
03:31
9
Spark Iceberg Config
02:31
10
Write data to Iceberg
01:32
11
Inspect metadata & schema eval
08:41
12
Inspect data on MinIO & Outro
01:37
Unlock unlimited learning

Get instant access to all 11 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription