Skip to main content
CF

AI Evals For Engineers & PMs

29h 21m 38s
English
Paid

Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.

If you encounter questions like these while working with AI:

  1. How to test applications where results are probabilistic and require subjective evaluation?
  2. If I change a prompt, how can I ensure nothing else breaks?
  3. Where should engineering efforts be directed? Is it necessary to test everything?
  4. What to do if there is no data or users - where to start?
  5. Which metrics should be tracked? What tools should be used? Which models should be selected?
  6. Is it possible to automate testing and evaluation? And if yes, how can you trust it?

- then this course is for you.

This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."

What to Expect

You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.

Course Content

  1. Basics and lifecycle of LLM application evaluation
  2. Systematic error analysis
  3. Building effective metrics and automated evaluation pipelines
  4. Collaborative practices and alignment of evaluation criteria
  5. Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
  6. Monitoring in production and continuous quality evaluation
  7. Organizing an effective human-in-the-loop review process
  8. Cost optimization and query routing


Learning Outcomes

  1. Master the best tools for finding, diagnosing, and prioritizing errors in AI.
  2. Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
  3. Build a "data flywheel" that ensures your AI improves over time.
  4. Learn to automate parts of the evaluation processes and trust them.
  5. Be able to customize AI to your preferences and requirements.
  6. Avoid common mistakes accumulated from the experience of more than 35 AI projects.
  7. Gain practical experience through end-to-end exercises, code, and analysis of real cases.


About the Authors

Hamel Husain

Hamel Husain thumbnail

Hamel Husain is a US ML engineer (formerly at Airbnb and GitHub, now at Parlance Labs), a fast.ai contributor, and one of the most visible independent voices on the production-engineering side of LLM systems — particularly around evals, fine-tuning, and the workflow that connects model training to deployed product features.

His CourseFlix listing carries AI Evals For Engineers & PMs. Material is paid and aimed at engineers and product managers shipping LLM-powered features who need to evaluate model output systematically rather than by gut.

Shreya Shankar

Shreya Shankar thumbnail

Shreya Shankar is a US ML engineer and PhD candidate (UC Berkeley, formerly Google Brain and Viaduct) focused on the production-engineering side of ML systems and LLM evals. She is one of the more cited independent voices on the eval discipline for AI applications.

Her CourseFlix listing carries AI Evals For Engineers & PMs — a structured treatment of the eval discipline applied to LLM applications: how to design eval datasets, choose appropriate metrics, run systematic comparisons, and use evals as a continuous-feedback tool rather than a one-off launch gate.

Material is paid and aimed at engineers and product managers shipping LLM-powered features. For broader content, see CourseFlix's AI for Business & Product category page.

Watch Online 41 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 41 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: 1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation
All Course Lessons (41)
#Lesson TitleDurationAccess
1
1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation Demo
56:41
2
2. Lesson 2. Systematic Error Analysis
01:01:39
3
3. Braintrust Tutorial w Wayde Gilliam
43:03
4
4. Optional. Office Hours
01:40:14
5
5. Lesson 3. More Error Analysis & Collaborative Evaluation
59:34
6
6. Lesson 4. Automated Evaluators
01:00:35
7
7. Taming diffusion QR codes with evals and inference-time scaling w Charles Frye
44:43
8
8. 10x Your RAG Evaluation by Avoiding these Pitfalls w Skylar Payne
28:26
9
9. Optional. Office Hours
01:18:26
10
10. Optional. Office Hours
47:12
11
11. Lesson 5. More Automated Evaluators
05:13
12
12. Lesson 6. RAG & Complex Architectures
59:46
13
13. Scaling Inference-Time Compute for Better LLM Judges w Leonard Tang
31:09
14
14. Building custom eval tools with coding agents w Isaac Flath
46:39
15
15. From Vibe Checks to Evals to Feedback Loops - Case Studies in Al System Maturities w David Karam
30:03
16
16. A Playbook For Building Al Agents You Can Trust w Udi Menkes
38:26
17
17. Al Evals in Vertical Industries (such as healthcare, finance and law) w Dr Chris Lovejoy
34:16
18
18. Arize Phoenix tutorial W Mikyo King
49:03
19
19. Optional. Office Hours
22:32
20
20. Optional. Office Hours
24:20
21
21. Optional. Office Hours
55:49
22
22. Lesson 7. Efficient Continuous Human Review Systems
59:03
23
23. Lesson 8. Cost Optimization
01:03:11
24
24. Techniques for evaluating agents w SallyAnn DeLucia (Arize)
33:38
25
25. LangSmith Tutorial w Harrison Chase
48:24
26
26. From Noob to 5 Automated Evals in 4 Weeks (as a PM) w Teresa Torres
01:10:21
27
27. Solvelt. The Thinking Developer's Environment w Jeremy Howard & Johno Whitaker
01:42:26
28
28. Testing Real Al Products LIVE w Robert Ta
01:00:49
29
29. Fireside Chat with DSP Creator w Omar Khattab
45:00
30
30. Optional. Office Hours
01:06:31
31
31. Optional. Office Hours (Bonus)
01:05:26
32
HW 1&2 walkthrough with Braintrust (pre-recorded) 1
10:50
33
HW 1&2 walkthrough with Braintrust (pre-recorded) 2
05:13
34
HW 1&2 walkthrough with Phoenix (pre-recorded)
15:04
35
HW 1&2 walkthrough with LangSmith (pre-recorded)
22:41
36
HW 3 walkthrough with Braintrust (pre-recorded)
21:41
37
HW 3 walkthrough with Phoenix (pre-recorded)
16:40
38
HW 4 walkthrough with Braintrust (pre-recorded)
23:11
39
HW 4 walkthrough with Phoenix (pre-recorded)
16:39
40
HW 5 walkthrough with Braintrust (pre-recorded)
22:03
41
HW 5 walkthrough with Phoenix (pre-recorded)
14:58
Unlock unlimited learning

Get instant access to all 40 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book AI Evals For Engineers & PMs

#TitleTypeOpen
1AIE - Braintrust Intro PDF
2Lesson 1 PDF
3Lesson 2 PDF
4Lesson 3 PDF
5Lesson 4 PDF
6Lesson 5 PDF
7Lesson 6 PDF
8Lesson 8 PDF
9LLM Evals Course Notes July PDF

Related courses

  • Enhanced Freelancing with AI thumbnailUpdated 2y ago

    Enhanced Freelancing with AI

    By: Zero To Mastery
    Learn to leverage AI to optimize your freelancing profile for visibility, craft personalized and persuasive proposals, and manage projects more efficiently.
    46m5/5
  • Perplexity AI for Professionals thumbnailUpdated 11mo ago

    Perplexity AI for Professionals

    By: Zero To Mastery
    Unlock the power of Perplexity AI to enhance your research capabilities, automate tasks, and boost your efficiency in the modern era of AI tools.
    56m
  • AI & Design Systems thumbnailUpdated 2mo ago

    AI & Design Systems

    By: Ian Frost, Brad Frost, TJ Pitre
    An advanced course on the interaction of AI and design systems. Learn methods for implementing modern technologies to improve the quality of digital products.
    8h 15m5/5

Frequently asked questions

What are the prerequisites for this course?
The course is designed for engineers and technical product managers who have a foundational understanding of programming. It is ideal for individuals who enjoy coding by intuition and are looking to improve their skills in evaluating AI applications. No advanced AI knowledge is required, but familiarity with basic programming concepts will be beneficial.
What types of projects will I work on during the course?
Throughout the course, you'll engage in practical exercises such as building custom evaluation tools with coding agents and working on case studies like 'From Vibe Checks to Evals to Feedback Loops'. These projects are designed to enhance your understanding of AI application evaluation and help you develop systems that outperform competitors.
Who is the target audience for this course?
This course is targeted at engineers and technical product managers who are involved in the development and improvement of AI applications. It is particularly beneficial for those who face challenges in evaluating AI systems and are looking for methods to improve AI application performance and reliability.
How does this course compare in depth and scope to similar courses?
The course offers a practical approach to AI evaluation, focusing on real-world applications and systematic error analysis. Unlike other courses that may provide a broad overview of AI, this course delves into specific tools and techniques such as automated evaluators and continuous human review systems, providing a deep dive into the evaluation lifecycle.
What specific tools or platforms are covered in the course?
The course covers a variety of tools and platforms essential for AI evaluation, including Braintrust, Arize Phoenix, and LangSmith. Tutorials and walkthroughs are provided for each, ensuring you gain hands-on experience in using these tools to streamline and automate the evaluation process.
What is not covered in the course?
The course does not cover foundational AI model building or basic programming instruction. It assumes a certain level of programming knowledge and focuses specifically on the evaluation and improvement of AI applications, rather than the initial development of AI models.
How much time should I expect to commit to this course?
While the total runtime of the course is not specified, it includes 41 lessons and several optional office hours. Given the practical nature of the course, students should anticipate dedicating additional time outside of the core lessons to engage with exercises and apply the techniques learned.