Skip to main content

Apache Iceberg Fundamentals

33m 32s
English
Paid

Course description

Modern data platforms need the flexibility of data lakes and the reliability of warehouses. Apache Iceberg combines both approaches. In this course, you will understand how this powerful open table format works, study its architecture, and learn to use its key features: schema evolution, "time travel," and high-performance analytics in Lakehouse systems. The course is based on practical examples from real data engineering. You will set up a local lab with Docker, Spark, and MinIO, create and manage Iceberg tables. From data recording and metadata analysis to query optimization and partition restructuring – you will gain the experience necessary for confidently working with Iceberg in a production environment. By the end of the course, you will not only understand how Iceberg is structured internally but also have a working environment, ready-made notebooks for projects, and a deep understanding of table operations that are critically important for Lakehouse architecture.
Read more about the course

Why Iceberg?

Iceberg addresses long-standing issues of big data: slow queries, complex schema changes, and the tight coupling of storage with computing systems. You'll learn why companies like Netflix, Stripe, and Apple have chosen Iceberg for their platforms and how to apply these approaches in your own setup.

What you will do:

  1. Build a local Lakehouse lab based on Iceberg using Docker Compose, Spark, REST catalog, and MinIO.
  2. Create your first Iceberg table using a fun dataset (like one with Pokémon), define the schema, write data through PySpark, and explore how Iceberg manages metadata.
  3. Master schema evolution: adding, renaming, and changing column types, as well as advanced partitioning techniques.
  4. Learn to perform point-in-time operations (such as deleting rows) and use the "time travel" feature to analyze past versions of data.
  5. Dive into Iceberg's architecture: parquet files, manifests, snapshots, and catalogs.
  6. Use the MinIO UI to see how data and metadata are physically stored.
  7. Run analytical SQL queries on Iceberg tables through PySpark, using familiar operations like join, group by, and filter.


Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 12 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Intro

All Course Lessons (12)

#Lesson TitleDurationAccess
1
Intro Demo
01:07
2
Goals
01:03
3
Challenges
04:10
4
Iceberg & Lakehouses
01:42
5
Architecture Deep Dive
02:02
6
Iceberg Features
02:45
7
Architecture & Summary
02:51
8
Setup & Docker
03:31
9
Spark Iceberg Config
02:31
10
Write data to Iceberg
01:32
11
Inspect metadata & schema eval
08:41
12
Inspect data on MinIO & Outro
01:37

Unlock unlimited learning

Get instant access to all 11 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Streaming with Kafka & Spark

Streaming with Kafka & Spark

Sources: Andreas Kretz
This course is a comprehensive project with a full cycle of real-time data processing. You will work with data from an online store, including invoices...
2 hours 46 minutes 25 seconds
Apache Kafka Fundamentals

Apache Kafka Fundamentals

Sources: Andreas Kretz
In this course, you will acquire the basic knowledge necessary for confidently starting to work with Apache Kafka. You will learn how to set up a message...
1 hour 4 minutes 52 seconds
The Data Engineering Bootcamp: Zero to Mastery

The Data Engineering Bootcamp: Zero to Mastery

Sources: zerotomastery.io
Learn to build streaming pipelines with Apache Kafka and Flink, create data lakes on AWS, run ML workflows on Spark, and integrate LLM models into...
16 hours 46 minutes 22 seconds
Machine Learning A-Z : Become Kaggle Master

Machine Learning A-Z : Become Kaggle Master

Sources: udemy
Want to become a good Data Scientist? Then this is a right course for you. This course has been designed by IIT professionals who have mastered in Mathematics and Data Science. ...
36 hours 23 minutes 54 seconds
Case Study in A/B Testing

Case Study in A/B Testing

Sources: LunarTech
Examples from practice in A/B testing - this course will introduce you to the methods of designing, conducting, and analyzing experiments using A/B...
1 hour 56 minutes 17 seconds