This course shows you how to build a small DeepSeek model from scratch. You learn each idea in clear steps. You write code, test it, and understand why each part works.
What You Will Build
You create a compact DeepSeek model that runs on a laptop. You start with core LLM ideas and the limits of a standard transformer. You then use the main DeepSeek methods to build a fast and lean model.
Core Ideas You Learn
Latent Attention
You replace full attention with a smaller latent space. This helps you cut memory use and speed up training.
Mixture of Experts
You add MoE layers. These layers route each token to a small set of expert networks. This gives you more model capacity without raising the total compute by much.
Multi-Token Prediction
You train the model to predict several future tokens at once. This improves training speed and helps the model learn stronger patterns.
Quantization and Efficient Training
You set up an FP8 pipeline. You also learn how to use smart parallel methods to train on limited hardware.
Post-Training Steps
Supervised Fine-Tuning
You guide the model with labeled examples. This helps shape its style and fix common errors.
Reinforcement Learning for Reasoning
You try simple RL steps to improve the model’s decisions. You see how reward design changes the model’s behavior.
How You Learn
The course uses short code blocks, drawings, and a clear problem-then-solution flow. You see each idea, try it, and check how it changes the model.
What You Get in the End
You finish with a working mini DeepSeek model. You know how to scale it, shrink it, and adapt it for research or small production tasks.