Master and Build Large Language Models
The best way to understand how Large Language Models (LLM) work is to build your own. And that is exactly what you will do in this course. In this exciting video course, AI expert Sebastian Raschka will guide you step by step through all the stages of creating an LLM - in practice and with explanations in liveVideo format. You will implement a project from his bestseller Build a Large Language Model (From Scratch) alongside the author.
Read more about the course
In this course, you will learn to:
- Plan architecture and write code for all LLM components
- Prepare a dataset suitable for training a language model
- Finetune LLM for text classification tasks and work with your own data
- Utilize human feedback to improve instruction following
- Load pre-trained weights into your model
This video course is perfect for:
- Developers who want to take initiative in AI-related projects
- Data scientists and ML researchers who need to be able to configure or create LLM from scratch
The course also includes a block of 6 mandatory introductory videos by Abhinav Kimoti, an expert in artificial intelligence and the author of the book A Simple Guide to Retrieval Augmented Generation. He explains everything you need to know before starting: from Python capabilities to advanced operations in PyTorch. Regardless of your level of preparation, you will gain a solid foundation for successful work with large language models.
Watch Online Master and Build Large Language Models
# | Title | Duration |
---|---|---|
1 | 1.1. Python Environment Setup Video | 21:10 |
2 | 1.2. Foundations to Build a Large Language Model (From Scratch) | 06:28 |
3 | 2.1. Prerequisites to Chapter 2 (1 | 01:07:40 |
4 | 2.2. Tokenizing text | 14:10 |
5 | 2.3. Converting tokens into token IDs | 09:59 |
6 | 2.4. Adding special context tokens | 06:36 |
7 | 2.5. Byte pair encoding | 13:40 |
8 | 2.6. Data sampling with a sliding window | 23:16 |
9 | 2.7. Creating token embeddings | 08:37 |
10 | 2.8. Encoding word positions | 12:23 |
11 | 3.1. Prerequisites to Chapter 3 (1 | 01:14:17 |
12 | 3.2. A simple self-attention mechanism without trainable weights | Part 1 | 41:10 |
13 | 3.3. A simple self-attention mechanism without trainable weights | Part 2 | 11:43 |
14 | 3.4. Computing the attention weights step by step | 20:00 |
15 | 3.5. Implementing a compact self-attention Python class | 08:31 |
16 | 3.6. Applying a causal attention mask | 11:37 |
17 | 3.7. Masking additional attention weights with dropout | 05:38 |
18 | 3.8. Implementing a compact causal self-attention class | 08:53 |
19 | 3.9. Stacking multiple single-head attention layers | 12:05 |
20 | 3.10. Implementing multi-head attention with weight splits | 16:47 |
21 | 4.1. Prerequisites to Chapter 4 (1 | 01:11:23 |
22 | 4.2. Coding an LLM architecture | 14:00 |
23 | 4.3. Normalizing activations with layer normalization | 22:14 |
24 | 4.4. Implementing a feed forward network with GELU activations | 16:19 |
25 | 4.5. Adding shortcut connections | 10:52 |
26 | 4.6. Connecting attention and linear layers in a transformer block | 12:14 |
27 | 4.7. Coding the GPT model | 12:45 |
28 | 4.8. Generating text | 17:47 |
29 | 5.1. Prerequisites to Chapter 5 | 23:58 |
30 | 5.2. Using GPT to generate text | 17:32 |
31 | 5.3. Calculating the text generation loss: cross entropy and perplexity | 27:14 |
32 | 5.4. Calculating the training and validation set losses | 24:52 |
33 | 5.5. Training an LLM | 27:04 |
34 | 5.6. Decoding strategies to control randomness | 03:37 |
35 | 5.7. Temperature scaling | 13:43 |
36 | 5.8. Top-k sampling | 08:20 |
37 | 5.9. Modifying the text generation function | 10:51 |
38 | 5.10. Loading and saving model weights in PyTorch | 04:24 |
39 | 5.11. Loading pretrained weights from OpenAI | 20:04 |
40 | 6.1. Prerequisites to Chapter 6 | 39:21 |
41 | 6.2. Preparing the dataset | 26:58 |
42 | 6.3. Creating data loaders | 16:08 |
43 | 6.4. Initializing a model with pretrained weights | 10:11 |
44 | 6.5. Adding a classification head | 15:38 |
45 | 6.6. Calculating the classification loss and accuracy | 22:32 |
46 | 6.7. Fine-tuning the model on supervised data | 33:36 |
47 | 6.8. Using the LLM as a spam classifier | 11:07 |
48 | 7.1. Preparing a dataset for supervised instruction fine-tuning | 15:48 |
49 | 7.2. Organizing data into training batches | 23:45 |
50 | 7.3. Creating data loaders for an instruction dataset | 07:31 |
51 | 7.4. Loading a pretrained LLM | 07:48 |
52 | 7.5. Fine-tuning the LLM on instruction data | 20:02 |
53 | 7.6. Extracting and saving responses | 09:40 |
54 | 7.7. Evaluating the fine-tuned LLM | 21:57 |
Similar courses to Master and Build Large Language Models

RAG (Retrieval)Mckay Wrigley (takeoff)

Build AI Agents with CrewAIzerotomastery.io

Design and Code User Interfaces with Galileo and Claude AIdesigncode.io

Building Apps with o1 Pro Template System: Part 1Mckay Wrigley (takeoff)

Learn how to use MCP (Model Context Protocol)Kevin Kern (instructa.ai)

Advanced AI: LLMs Explained with Math (Transformers, Attention Mechanisms & More)zerotomastery.io

Build SwiftUI apps for iOS 18 with Cursor and Xcodedesigncode.io

AI Engineering: Fine-Tuning LLMszerotomastery.io
