Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.
If you encounter questions like these while working with AI:
How to test applications where results are probabilistic and require subjective evaluation?
If I change a prompt, how can I ensure nothing else breaks?
Where should engineering efforts be directed? Is it necessary to test everything?
What to do if there is no data or users - where to start?
Which metrics should be tracked? What tools should be used? Which models should be selected?
Is it possible to automate testing and evaluation? And if yes, how can you trust it?
- then this course is for you.
This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."
What to Expect
You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.
Course Content
Basics and lifecycle of LLM application evaluation
Systematic error analysis
Building effective metrics and automated evaluation pipelines
Collaborative practices and alignment of evaluation criteria
Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
Monitoring in production and continuous quality evaluation
Organizing an effective human-in-the-loop review process
Cost optimization and query routing
Learning Outcomes
Master the best tools for finding, diagnosing, and prioritizing errors in AI.
Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
Build a "data flywheel" that ensures your AI improves over time.
Learn to automate parts of the evaluation processes and trust them.
Be able to customize AI to your preferences and requirements.
Avoid common mistakes accumulated from the experience of more than 35 AI projects.
Gain practical experience through end-to-end exercises, code, and analysis of real cases.
Hamel Husain is a US ML engineer (formerly at Airbnb and GitHub, now at Parlance Labs), a fast.ai contributor, and one of the most visible independent voices on the production-engineering side of LLM systems — particularly around evals, fine-tuning, and the workflow that connects model training to deployed product features.
His CourseFlix listing carries AI Evals For Engineers & PMs. Material is paid and aimed at engineers and product managers shipping LLM-powered features who need to evaluate model output systematically rather than by gut.
Shreya Shankar is a US ML engineer and PhD candidate (UC Berkeley, formerly Google Brain and Viaduct) focused on the production-engineering side of ML systems and LLM evals. She is one of the more cited independent voices on the eval discipline for AI applications.
Her CourseFlix listing carries AI Evals For Engineers & PMs — a structured treatment of the eval discipline applied to LLM applications: how to design eval datasets, choose appropriate metrics, run systematic comparisons, and use evals as a continuous-feedback tool rather than a one-off launch gate.
Material is paid and aimed at engineers and product managers shipping LLM-powered features. For broader content, see CourseFlix's AI for Business & Product category page.
Watch Online 41 lessons
This is a demo lesson (10:00 remaining)
You can watch up to 10 minutes for free. Subscribe to unlock all 41 lessons in this course and access 10,000+ hours of premium content across all courses.
Learn to use NotebookLM from Google to simplify research, analyze content, and increase productivity.
2 hours 3 minutes 22 seconds
Frequently asked questions
What is AI Evals For Engineers & PMs about?
Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case. If you encounter questions like these while working with AI: How to test applications where…
Who teaches AI Evals For Engineers & PMs?
AI Evals For Engineers & PMs is taught by Hamel Husain, Shreya Shankar. You can find more courses by these instructors on the corresponding source pages.
How long is AI Evals For Engineers & PMs?
AI Evals For Engineers & PMs contains 41 lessons with a total runtime of 29 hours 21 minutes. All lessons are available to watch online at your own pace.
Is AI Evals For Engineers & PMs free to watch?
AI Evals For Engineers & PMs is part of CourseFlix's premium catalog. A CourseFlix subscription unlocks the full video player; the course description, table of contents, and preview information are available to everyone.
Where can I watch AI Evals For Engineers & PMs online?
AI Evals For Engineers & PMs is available to watch online on CourseFlix at https://courseflix.net/course/ai-evals-for-engineers-pms. The page hosts every lesson with the integrated video player; no download is required.