Skip to main content
CF

Apache Iceberg Fundamentals

33m 32s
English
Paid

Unlock the potential of modern data platforms with Apache Iceberg, which masterfully combines the flexibility of data lakes with the reliability of data warehouses. In this course, you will delve into the workings of this powerful open table format, explore its architecture, and harness its key features such as schema evolution, "time travel," and high-performance analytics within Lakehouse systems.

Course Overview

Based on hands-on examples from real-world data engineering, this course guides you through setting up a local lab with Docker, Spark, and MinIO, along with creating and managing Iceberg tables. You'll acquire the skills needed to handle tasks ranging from data recording and metadata analysis to query optimization and partition restructuring, preparing you for confident application of Iceberg in production environments.

Learning Outcomes

By concluding this course, you will not only grasp the internal structure of Iceberg but also possess a functional environment with ready-made notebooks for your projects and a deep understanding of table operations key to Lakehouse architecture.

Why Choose Iceberg?

Iceberg solves persistent big data challenges including slow queries, complex schema changes, and storage tightly coupled with computing systems. Discover why industry giants such as Netflix, Stripe, and Apple have adopted Iceberg and learn to integrate its methodologies into your own systems.

Course Activities

What you will do:

  1. Establish a local Lakehouse lab using Iceberg with Docker Compose, Spark, REST catalog, and MinIO.
  2. Get hands-on experience by creating an Iceberg table with an engaging dataset, define schemas, write data using PySpark, and explore Iceberg's metadata management.
  3. Gain expertise in schema evolution, including adding, renaming, and modifying column types, as well as employing advanced partitioning techniques.
  4. Perform point-in-time operations, like row deletions, and utilize the "time travel" feature to analyze past data versions.
  5. Explore Iceberg's architecture, covering parquet files, manifests, snapshots, and catalogs.
  6. Use the MinIO UI to visualize physical storage of data and metadata.
  7. Execute analytical SQL queries on Iceberg tables through PySpark using common operations such as join, group by, and filter.


Additional

https://github.com/team-data-science/iceberg

About the Author: David Reger

David Reger thumbnail

David Reger is a data engineer and educator focused on the modern data-lakehouse stack, particularly Apache Iceberg and the table-format-revolution that has reshaped how analytical data warehouses are built.

His CourseFlix listing carries Apache Iceberg Fundamentals — covering Iceberg's table format, schema evolution, partitioning model, and the catalog / engine integration that lets multiple query engines (Spark, Trino, Snowflake, Athena) work against the same underlying data.

Material is paid and aimed at data engineers picking up the Iceberg table format for analytical workloads on top of object storage. For broader data content, see CourseFlix's Data processing and analysis category page.

Watch Online 12 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 12 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Intro
All Course Lessons (12)
#Lesson TitleDurationAccess
1
Intro Demo
01:07
2
Goals
01:03
3
Challenges
04:10
4
Iceberg & Lakehouses
01:42
5
Architecture Deep Dive
02:02
6
Iceberg Features
02:45
7
Architecture & Summary
02:51
8
Setup & Docker
03:31
9
Spark Iceberg Config
02:31
10
Write data to Iceberg
01:32
11
Inspect metadata & schema eval
08:41
12
Inspect data on MinIO & Outro
01:37
Unlock unlimited learning

Get instant access to all 11 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What prerequisites should I have before enrolling in this course?
Before enrolling, you should have a basic understanding of data engineering concepts and familiarity with data lakes and data warehouses. Experience with Docker and Spark will be beneficial, as the course involves setting up a local lab environment using these tools. Familiarity with any programming language for scripting purposes would also be helpful.
What kinds of projects or exercises will I work on during the course?
During the course, you will engage in hands-on exercises such as setting up a local Lakehouse lab using Docker, Spark, and MinIO. You will also practice creating and managing Iceberg tables, performing metadata analysis, optimizing queries, and restructuring partitions, which are essential tasks for applying Iceberg in production environments.
Who is the target audience for this course?
The course is designed for data engineers and IT professionals who are looking to enhance their skills in managing modern data platforms. It is particularly beneficial for those interested in solving big data challenges related to slow queries and complex schema changes using the Apache Iceberg open table format.
How does this course compare to other courses on data lake technologies?
This course focuses specifically on Apache Iceberg and its integration into Lakehouse architectures. Unlike more general data lake courses, it provides an in-depth look at Iceberg's features such as schema evolution and 'time travel,' positioning it as a specialized course for tackling persistent big data challenges.
What specific tools or platforms are covered in the course?
The course covers the use of Docker for environment setup, Spark for data processing, and MinIO as a storage solution. These tools are integral to establishing a local Lakehouse lab and working with Apache Iceberg tables, allowing you to gain hands-on experience with a complete data platform setup.
What topics or skills are not covered in this course?
This course does not cover general data warehousing concepts beyond those directly related to Apache Iceberg. It also does not delve into other big data technologies or platforms outside the context of Iceberg, Spark, and MinIO. Additionally, it does not include programming language tutorials, assuming prior knowledge.
How can the skills gained from this course be applied to other areas or careers?
The skills acquired from this course, such as handling table operations and query optimization within Lakehouse systems, are applicable to various roles in data engineering and data management. Understanding how to integrate Apache Iceberg into data platforms can enhance your ability to work with scalable, efficient data solutions in any organization leveraging big data technologies.