Skip to main content

AI Evals For Engineers & PMs

29h 21m 38s
English
Paid

Course description

Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.

If you encounter questions like these while working with AI:

  1. How to test applications where results are probabilistic and require subjective evaluation?
  2. If I change a prompt, how can I ensure nothing else breaks?
  3. Where should engineering efforts be directed? Is it necessary to test everything?
  4. What to do if there is no data or users - where to start?
  5. Which metrics should be tracked? What tools should be used? Which models should be selected?
  6. Is it possible to automate testing and evaluation? And if yes, how can you trust it?

- then this course is for you.

This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."

Read more about the course

What to Expect

You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.

Course Content

  1. Basics and lifecycle of LLM application evaluation
  2. Systematic error analysis
  3. Building effective metrics and automated evaluation pipelines
  4. Collaborative practices and alignment of evaluation criteria
  5. Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
  6. Monitoring in production and continuous quality evaluation
  7. Organizing an effective human-in-the-loop review process
  8. Cost optimization and query routing


Learning Outcomes

  1. Master the best tools for finding, diagnosing, and prioritizing errors in AI.
  2. Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
  3. Build a "data flywheel" that ensures your AI improves over time.
  4. Learn to automate parts of the evaluation processes and trust them.
  5. Be able to customize AI to your preferences and requirements.
  6. Avoid common mistakes accumulated from the experience of more than 35 AI projects.
  7. Gain practical experience through end-to-end exercises, code, and analysis of real cases.


Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 41 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: 1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation

All Course Lessons (41)

#Lesson TitleDurationAccess
1
1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation Demo
56:41
2
2. Lesson 2. Systematic Error Analysis
01:01:39
3
3. Braintrust Tutorial w Wayde Gilliam
43:03
4
4. Optional. Office Hours
01:40:14
5
5. Lesson 3. More Error Analysis & Collaborative Evaluation
59:34
6
6. Lesson 4. Automated Evaluators
01:00:35
7
7. Taming diffusion QR codes with evals and inference-time scaling w Charles Frye
44:43
8
8. 10x Your RAG Evaluation by Avoiding these Pitfalls w Skylar Payne
28:26
9
9. Optional. Office Hours
01:18:26
10
10. Optional. Office Hours
47:12
11
11. Lesson 5. More Automated Evaluators
05:13
12
12. Lesson 6. RAG & Complex Architectures
59:46
13
13. Scaling Inference-Time Compute for Better LLM Judges w Leonard Tang
31:09
14
14. Building custom eval tools with coding agents w Isaac Flath
46:39
15
15. From Vibe Checks to Evals to Feedback Loops - Case Studies in Al System Maturities w David Karam
30:03
16
16. A Playbook For Building Al Agents You Can Trust w Udi Menkes
38:26
17
17. Al Evals in Vertical Industries (such as healthcare, finance and law) w Dr Chris Lovejoy
34:16
18
18. Arize Phoenix tutorial W Mikyo King
49:03
19
19. Optional. Office Hours
22:32
20
20. Optional. Office Hours
24:20
21
21. Optional. Office Hours
55:49
22
22. Lesson 7. Efficient Continuous Human Review Systems
59:03
23
23. Lesson 8. Cost Optimization
01:03:11
24
24. Techniques for evaluating agents w SallyAnn DeLucia (Arize)
33:38
25
25. LangSmith Tutorial w Harrison Chase
48:24
26
26. From Noob to 5 Automated Evals in 4 Weeks (as a PM) w Teresa Torres
01:10:21
27
27. Solvelt. The Thinking Developer's Environment w Jeremy Howard & Johno Whitaker
01:42:26
28
28. Testing Real Al Products LIVE w Robert Ta
01:00:49
29
29. Fireside Chat with DSP Creator w Omar Khattab
45:00
30
30. Optional. Office Hours
01:06:31
31
31. Optional. Office Hours (Bonus)
01:05:26
32
HW 1&2 walkthrough with Braintrust (pre-recorded) 1
10:50
33
HW 1&2 walkthrough with Braintrust (pre-recorded) 2
05:13
34
HW 1&2 walkthrough with Phoenix (pre-recorded)
15:04
35
HW 1&2 walkthrough with LangSmith (pre-recorded)
22:41
36
HW 3 walkthrough with Braintrust (pre-recorded)
21:41
37
HW 3 walkthrough with Phoenix (pre-recorded)
16:40
38
HW 4 walkthrough with Braintrust (pre-recorded)
23:11
39
HW 4 walkthrough with Phoenix (pre-recorded)
16:39
40
HW 5 walkthrough with Braintrust (pre-recorded)
22:03
41
HW 5 walkthrough with Phoenix (pre-recorded)
14:58

Unlock unlimited learning

Get instant access to all 40 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book AI Evals For Engineers & PMs

#Title
1AIE - Braintrust Intro
2Lesson 1
3Lesson 2
4Lesson 3
5Lesson 4
6Lesson 5
7Lesson 6
8Lesson 8
9LLM Evals Course Notes July

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

GenAI RAG with LlamaIndex, Ollama and Elasticsearch

GenAI RAG with LlamaIndex, Ollama and Elasticsearch

Sources: Andreas Kretz
Learn how to develop a local RAG system for processing PDFs with LlamaIndex and Ollama, using Elasticsearch and Mistral. Master the creation of chat interfaces.
1 hour 49 minutes 50 seconds
Model Context Protocol (MCP) 101

Model Context Protocol (MCP) 101

Sources: Mckay Wrigley (takeoff)
In this course, you will learn what Model Context Protocol (MCP) is, why it is important, and how to apply it in practice. We will cover the main principles...
2 hours 10 minutes 15 seconds
Build a React Native app with Claude AI

Build a React Native app with Claude AI

Sources: designcode.io
This comprehensive course is dedicated to integrating advanced AI tools into the workflow of development in React Native, which allows for a radical change in a
13 hours 53 minutes 10 seconds
AI-Supercharged Storytelling & Data Analysis for Leaders and Managers

AI-Supercharged Storytelling & Data Analysis for Leaders and Managers

Sources: Amit Rawal
In this course, you will master the step-by-step process: from asking the right questions to creating powerful data-driven stories that will help you achieve...
7 hours 20 minutes 56 seconds
Claude Code

Claude Code

Sources: Mckay Wrigley (takeoff)
Claude Code is a course that teaches how to use the intelligent assistant (AI) from Anthropic for programming directly in the terminal. It helps write...
2 hours 23 minutes 22 seconds