Skip to main content
CF

Master and Build Large Language Models

17h 15m 55s
English
Paid

The best way to understand how Large Language Models (LLM) work is to build your own. And that is exactly what you will do in this course. In this exciting video course, AI expert Sebastian Raschka will guide you step by step through all the stages of creating an LLM - in practice and with explanations in liveVideo format. You will implement a project from his bestseller Build a Large Language Model (From Scratch) alongside the author.

In this course, you will learn to:

  • Plan architecture and write code for all LLM components
  • Prepare a dataset suitable for training a language model
  • Finetune LLM for text classification tasks and work with your own data
  • Utilize human feedback to improve instruction following
  • Load pre-trained weights into your model

This video course is perfect for:

  • Developers who want to take initiative in AI-related projects
  • Data scientists and ML researchers who need to be able to configure or create LLM from scratch

The course also includes a block of 6 mandatory introductory videos by Abhinav Kimoti, an expert in artificial intelligence and the author of the book A Simple Guide to Retrieval Augmented Generation. He explains everything you need to know before starting: from Python capabilities to advanced operations in PyTorch. Regardless of your level of preparation, you will gain a solid foundation for successful work with large language models.

About the Authors

Abhinav Kimothi

Abhinav Kimothi thumbnail

Abhinav Kimothi is a US-based AI engineer and educator focused on the production-engineering side of large language models — RAG pipelines, evals, prompt engineering, and the operational concerns around shipping LLM features into real products.

His CourseFlix listing carries Master and Build Large Language Models — covering the LLM fundamentals from the engineering side: how the models are trained, what fine-tuning involves, the prompt-engineering craft, and the production patterns for integrating LLMs into real applications.

Material is paid and aimed at engineers picking up applied LLM work. For broader content, see CourseFlix's LLMs & Fundamentals category page where this course sits alongside material from Sebastian Raschka and Towards AI.

Sebastian Raschka

Sebastian Raschka thumbnail

Sebastian Raschka is a German-American machine-learning researcher (Lightning AI), University of Wisconsin-Madison faculty alumnus, and one of the most widely cited textbook authors in modern ML. His books — Python Machine Learning, Build a Large Language Model (From Scratch), and Build a Reasoning Model (From Scratch) — anchor the from-scratch implementation track in the field.

His CourseFlix listing carries three Sebastian Raschka courses: Build a Large Language Model (From Scratch), Master and Build Large Language Models, and Build a Reasoning Model (From Scratch). Material is paid and aimed at engineers and researchers who want to understand the math and code underneath modern LLMs rather than treat them as black boxes.

Watch Online 54 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 54 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: 1.1. Python Environment Setup Video
All Course Lessons (54)
#Lesson TitleDurationAccess
1
1.1. Python Environment Setup Video Demo
21:10
2
1.2. Foundations to Build a Large Language Model (From Scratch)
06:28
3
2.1. Prerequisites to Chapter 2 (1
01:07:40
4
2.2. Tokenizing text
14:10
5
2.3. Converting tokens into token IDs
09:59
6
2.4. Adding special context tokens
06:36
7
2.5. Byte pair encoding
13:40
8
2.6. Data sampling with a sliding window
23:16
9
2.7. Creating token embeddings
08:37
10
2.8. Encoding word positions
12:23
11
3.1. Prerequisites to Chapter 3 (1
01:14:17
12
3.2. A simple self-attention mechanism without trainable weights | Part 1
41:10
13
3.3. A simple self-attention mechanism without trainable weights | Part 2
11:43
14
3.4. Computing the attention weights step by step
20:00
15
3.5. Implementing a compact self-attention Python class
08:31
16
3.6. Applying a causal attention mask
11:37
17
3.7. Masking additional attention weights with dropout
05:38
18
3.8. Implementing a compact causal self-attention class
08:53
19
3.9. Stacking multiple single-head attention layers
12:05
20
3.10. Implementing multi-head attention with weight splits
16:47
21
4.1. Prerequisites to Chapter 4 (1
01:11:23
22
4.2. Coding an LLM architecture
14:00
23
4.3. Normalizing activations with layer normalization
22:14
24
4.4. Implementing a feed forward network with GELU activations
16:19
25
4.5. Adding shortcut connections
10:52
26
4.6. Connecting attention and linear layers in a transformer block
12:14
27
4.7. Coding the GPT model
12:45
28
4.8. Generating text
17:47
29
5.1. Prerequisites to Chapter 5
23:58
30
5.2. Using GPT to generate text
17:32
31
5.3. Calculating the text generation loss: cross entropy and perplexity
27:14
32
5.4. Calculating the training and validation set losses
24:52
33
5.5. Training an LLM
27:04
34
5.6. Decoding strategies to control randomness
03:37
35
5.7. Temperature scaling
13:43
36
5.8. Top-k sampling
08:20
37
5.9. Modifying the text generation function
10:51
38
5.10. Loading and saving model weights in PyTorch
04:24
39
5.11. Loading pretrained weights from OpenAI
20:04
40
6.1. Prerequisites to Chapter 6
39:21
41
6.2. Preparing the dataset
26:58
42
6.3. Creating data loaders
16:08
43
6.4. Initializing a model with pretrained weights
10:11
44
6.5. Adding a classification head
15:38
45
6.6. Calculating the classification loss and accuracy
22:32
46
6.7. Fine-tuning the model on supervised data
33:36
47
6.8. Using the LLM as a spam classifier
11:07
48
7.1. Preparing a dataset for supervised instruction fine-tuning
15:48
49
7.2. Organizing data into training batches
23:45
50
7.3. Creating data loaders for an instruction dataset
07:31
51
7.4. Loading a pretrained LLM
07:48
52
7.5. Fine-tuning the LLM on instruction data
20:02
53
7.6. Extracting and saving responses
09:40
54
7.7. Evaluating the fine-tuned LLM
21:57
Unlock unlimited learning

Get instant access to all 53 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What are the prerequisites for enrolling in this course?
Before enrolling, students should have a fundamental understanding of Python programming and machine learning concepts. The course begins with a Python Environment Setup, so familiarity with Python installations and packages will be beneficial. Additionally, knowledge of neural networks and basic AI concepts will help in understanding the advanced mechanisms used in building a Large Language Model from scratch.
What will I be building by the end of the course?
By the end of the course, you will have built your own Large Language Model (LLM) from scratch. This includes components like tokenization, self-attention mechanisms, multi-head attention, and a transformer block, culminating in a fully functional GPT model capable of generating text and performing tasks such as spam classification.
Who is the target audience for this course?
This course is aimed at individuals with a background in programming and machine learning who are interested in understanding and building Large Language Models. It is particularly useful for those who want to implement their own LLMs for research or application purposes, and for AI enthusiasts looking to deepen their understanding of how models like GPT are constructed and fine-tuned.
How does the depth of this course compare to other LLM courses?
This course offers a detailed, step-by-step guide to building an LLM from scratch, covering everything from tokenization to fine-tuning a model on supervised data. Unlike courses that focus solely on using prebuilt models, this course provides hands-on experience in coding the architecture and understanding the underlying mechanics, giving students a deeper insight into the construction of LLMs.
What specific tools or platforms are used in this course?
The course extensively uses Python and PyTorch for implementing various components of the Large Language Model. Students will engage with PyTorch for tasks like loading and saving model weights, and during the fine-tuning process. Byte pair encoding and temperature scaling are among the specific techniques taught for model optimization and text generation.
What topics or technologies are not covered in this course?
The course does not cover the deployment of Large Language Models in production environments or the use of cloud-based AI services. Additionally, it focuses on building models from scratch, so it does not delve into advanced prebuilt model APIs or proprietary language models like those provided by commercial AI providers beyond loading pretrained weights from OpenAI.
How much time should I expect to commit to this course?
The course consists of 54 lessons. While the exact runtime is not specified, a comprehensive understanding of each topic and the implementation of exercises will require a significant time investment. Students should be prepared to dedicate several weeks to complete the course, depending on their prior experience and familiarity with the material.