Nathan Lambert is a US AI researcher (Allen Institute for AI) and the author of The RLHF Book — one of the most authoritative practitioner-focused references on Reinforcement Learning from Human Feedback, the post-training method that anchors how modern instruction-tuned LLMs (ChatGPT, Claude, Llama-Chat) are aligned to be useful and safe.
His CourseFlix listing carries The RLHF Book — Reinforcement Learning from Human Feedback — a comprehensive treatment of the RLHF pipeline, reward modeling, the PPO and DPO training methods, and the engineering decisions underneath production LLM alignment.
Material is paid and aimed at ML engineers and researchers working on LLM training. For broader content, see CourseFlix's LLMs & Fundamentals category page.