Skip to main content

AI Evals For Engineers & PMs

29h 21m 38s
English
Paid

Course description

Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.

If you encounter questions like these while working with AI:

  1. How to test applications where results are probabilistic and require subjective evaluation?
  2. If I change a prompt, how can I ensure nothing else breaks?
  3. Where should engineering efforts be directed? Is it necessary to test everything?
  4. What to do if there is no data or users - where to start?
  5. Which metrics should be tracked? What tools should be used? Which models should be selected?
  6. Is it possible to automate testing and evaluation? And if yes, how can you trust it?

- then this course is for you.

This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."

Read more about the course

What to Expect

You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.

Course Content

  1. Basics and lifecycle of LLM application evaluation
  2. Systematic error analysis
  3. Building effective metrics and automated evaluation pipelines
  4. Collaborative practices and alignment of evaluation criteria
  5. Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
  6. Monitoring in production and continuous quality evaluation
  7. Organizing an effective human-in-the-loop review process
  8. Cost optimization and query routing


Learning Outcomes

  1. Master the best tools for finding, diagnosing, and prioritizing errors in AI.
  2. Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
  3. Build a "data flywheel" that ensures your AI improves over time.
  4. Learn to automate parts of the evaluation processes and trust them.
  5. Be able to customize AI to your preferences and requirements.
  6. Avoid common mistakes accumulated from the experience of more than 35 AI projects.
  7. Gain practical experience through end-to-end exercises, code, and analysis of real cases.


Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 41 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing

Watch Online AI Evals For Engineers & PMs

0:00
/
#1: 1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation

All Course Lessons (41)

#Lesson TitleDurationAccess
1
1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation Demo
56:41
2
2. Lesson 2. Systematic Error Analysis
01:01:39
3
3. Braintrust Tutorial w Wayde Gilliam
43:03
4
4. Optional. Office Hours
01:40:14
5
5. Lesson 3. More Error Analysis & Collaborative Evaluation
59:34
6
6. Lesson 4. Automated Evaluators
01:00:35
7
7. Taming diffusion QR codes with evals and inference-time scaling w Charles Frye
44:43
8
8. 10x Your RAG Evaluation by Avoiding these Pitfalls w Skylar Payne
28:26
9
9. Optional. Office Hours
01:18:26
10
10. Optional. Office Hours
47:12
11
11. Lesson 5. More Automated Evaluators
05:13
12
12. Lesson 6. RAG & Complex Architectures
59:46
13
13. Scaling Inference-Time Compute for Better LLM Judges w Leonard Tang
31:09
14
14. Building custom eval tools with coding agents w Isaac Flath
46:39
15
15. From Vibe Checks to Evals to Feedback Loops - Case Studies in Al System Maturities w David Karam
30:03
16
16. A Playbook For Building Al Agents You Can Trust w Udi Menkes
38:26
17
17. Al Evals in Vertical Industries (such as healthcare, finance and law) w Dr Chris Lovejoy
34:16
18
18. Arize Phoenix tutorial W Mikyo King
49:03
19
19. Optional. Office Hours
22:32
20
20. Optional. Office Hours
24:20
21
21. Optional. Office Hours
55:49
22
22. Lesson 7. Efficient Continuous Human Review Systems
59:03
23
23. Lesson 8. Cost Optimization
01:03:11
24
24. Techniques for evaluating agents w SallyAnn DeLucia (Arize)
33:38
25
25. LangSmith Tutorial w Harrison Chase
48:24
26
26. From Noob to 5 Automated Evals in 4 Weeks (as a PM) w Teresa Torres
01:10:21
27
27. Solvelt. The Thinking Developer's Environment w Jeremy Howard & Johno Whitaker
01:42:26
28
28. Testing Real Al Products LIVE w Robert Ta
01:00:49
29
29. Fireside Chat with DSP Creator w Omar Khattab
45:00
30
30. Optional. Office Hours
01:06:31
31
31. Optional. Office Hours (Bonus)
01:05:26
32
HW 1&2 walkthrough with Braintrust (pre-recorded) 1
10:50
33
HW 1&2 walkthrough with Braintrust (pre-recorded) 2
05:13
34
HW 1&2 walkthrough with Phoenix (pre-recorded)
15:04
35
HW 1&2 walkthrough with LangSmith (pre-recorded)
22:41
36
HW 3 walkthrough with Braintrust (pre-recorded)
21:41
37
HW 3 walkthrough with Phoenix (pre-recorded)
16:40
38
HW 4 walkthrough with Braintrust (pre-recorded)
23:11
39
HW 4 walkthrough with Phoenix (pre-recorded)
16:39
40
HW 5 walkthrough with Braintrust (pre-recorded)
22:03
41
HW 5 walkthrough with Phoenix (pre-recorded)
14:58

Unlock unlimited learning

Get instant access to all 40 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book AI Evals For Engineers & PMs

#Title
1AIE - Braintrust Intro
2Lesson 1
3Lesson 2
4Lesson 3
5Lesson 4
6Lesson 5
7Lesson 6
8Lesson 8
9LLM Evals Course Notes July

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Building Apps with o1 Pro Template System: Part 1

Building Apps with o1 Pro Template System: Part 1

Sources: Mckay Wrigley (takeoff)
This is the first part of a two-part practical course. In this module, you will get acquainted with the basic workflow of creating applications using...
4 hours 4 minutes 38 seconds
AI Agents Bootcamp: Zero to Mastery

AI Agents Bootcamp: Zero to Mastery

Sources: zerotomastery.io
This is not a course about "clever prompts" - it's a course about building real AI systems that actually get the job done. You will go beyond simple chatbots...
6 hours 55 minutes 29 seconds
Design and Code User Interfaces with Galileo and Claude AI

Design and Code User Interfaces with Galileo and Claude AI

Sources: designcode.io
In this course, you will learn how to use AI tools to accelerate and simplify UI/UX design processes. We will start with Galileo AI to quickly create...
3 hours 42 minutes 41 seconds
Model Context Protocol (MCP) 101

Model Context Protocol (MCP) 101

Sources: Mckay Wrigley (takeoff)
In this course, you will learn what Model Context Protocol (MCP) is, why it is important, and how to apply it in practice. We will cover the main principles...
2 hours 10 minutes 15 seconds