Skip to main content

AI Evals For Engineers & PMs

29h 21m 38s
English
Paid

Course description

Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.

If you encounter questions like these while working with AI:

  1. How to test applications where results are probabilistic and require subjective evaluation?
  2. If I change a prompt, how can I ensure nothing else breaks?
  3. Where should engineering efforts be directed? Is it necessary to test everything?
  4. What to do if there is no data or users - where to start?
  5. Which metrics should be tracked? What tools should be used? Which models should be selected?
  6. Is it possible to automate testing and evaluation? And if yes, how can you trust it?

- then this course is for you.

This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."

Read more about the course

What to Expect

You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.

Course Content

  1. Basics and lifecycle of LLM application evaluation
  2. Systematic error analysis
  3. Building effective metrics and automated evaluation pipelines
  4. Collaborative practices and alignment of evaluation criteria
  5. Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
  6. Monitoring in production and continuous quality evaluation
  7. Organizing an effective human-in-the-loop review process
  8. Cost optimization and query routing


Learning Outcomes

  1. Master the best tools for finding, diagnosing, and prioritizing errors in AI.
  2. Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
  3. Build a "data flywheel" that ensures your AI improves over time.
  4. Learn to automate parts of the evaluation processes and trust them.
  5. Be able to customize AI to your preferences and requirements.
  6. Avoid common mistakes accumulated from the experience of more than 35 AI projects.
  7. Gain practical experience through end-to-end exercises, code, and analysis of real cases.


Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 41 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: 1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation

All Course Lessons (41)

#Lesson TitleDurationAccess
1
1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation Demo
56:41
2
2. Lesson 2. Systematic Error Analysis
01:01:39
3
3. Braintrust Tutorial w Wayde Gilliam
43:03
4
4. Optional. Office Hours
01:40:14
5
5. Lesson 3. More Error Analysis & Collaborative Evaluation
59:34
6
6. Lesson 4. Automated Evaluators
01:00:35
7
7. Taming diffusion QR codes with evals and inference-time scaling w Charles Frye
44:43
8
8. 10x Your RAG Evaluation by Avoiding these Pitfalls w Skylar Payne
28:26
9
9. Optional. Office Hours
01:18:26
10
10. Optional. Office Hours
47:12
11
11. Lesson 5. More Automated Evaluators
05:13
12
12. Lesson 6. RAG & Complex Architectures
59:46
13
13. Scaling Inference-Time Compute for Better LLM Judges w Leonard Tang
31:09
14
14. Building custom eval tools with coding agents w Isaac Flath
46:39
15
15. From Vibe Checks to Evals to Feedback Loops - Case Studies in Al System Maturities w David Karam
30:03
16
16. A Playbook For Building Al Agents You Can Trust w Udi Menkes
38:26
17
17. Al Evals in Vertical Industries (such as healthcare, finance and law) w Dr Chris Lovejoy
34:16
18
18. Arize Phoenix tutorial W Mikyo King
49:03
19
19. Optional. Office Hours
22:32
20
20. Optional. Office Hours
24:20
21
21. Optional. Office Hours
55:49
22
22. Lesson 7. Efficient Continuous Human Review Systems
59:03
23
23. Lesson 8. Cost Optimization
01:03:11
24
24. Techniques for evaluating agents w SallyAnn DeLucia (Arize)
33:38
25
25. LangSmith Tutorial w Harrison Chase
48:24
26
26. From Noob to 5 Automated Evals in 4 Weeks (as a PM) w Teresa Torres
01:10:21
27
27. Solvelt. The Thinking Developer's Environment w Jeremy Howard & Johno Whitaker
01:42:26
28
28. Testing Real Al Products LIVE w Robert Ta
01:00:49
29
29. Fireside Chat with DSP Creator w Omar Khattab
45:00
30
30. Optional. Office Hours
01:06:31
31
31. Optional. Office Hours (Bonus)
01:05:26
32
HW 1&2 walkthrough with Braintrust (pre-recorded) 1
10:50
33
HW 1&2 walkthrough with Braintrust (pre-recorded) 2
05:13
34
HW 1&2 walkthrough with Phoenix (pre-recorded)
15:04
35
HW 1&2 walkthrough with LangSmith (pre-recorded)
22:41
36
HW 3 walkthrough with Braintrust (pre-recorded)
21:41
37
HW 3 walkthrough with Phoenix (pre-recorded)
16:40
38
HW 4 walkthrough with Braintrust (pre-recorded)
23:11
39
HW 4 walkthrough with Phoenix (pre-recorded)
16:39
40
HW 5 walkthrough with Braintrust (pre-recorded)
22:03
41
HW 5 walkthrough with Phoenix (pre-recorded)
14:58

Unlock unlimited learning

Get instant access to all 40 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book AI Evals For Engineers & PMs

#Title
1AIE - Braintrust Intro
2Lesson 1
3Lesson 2
4Lesson 3
5Lesson 4
6Lesson 5
7Lesson 6
8Lesson 8
9LLM Evals Course Notes July

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Learn to build Web Apps with Bolt.new and AI

Learn to build Web Apps with Bolt.new and AI

Sources: Kevin Kern (instructa.ai)
The course "Creating Web Applications with Bolt.new and AI" offers a comprehensive guide on creating, editing, and launching web applications using Bolt.new...
3 hours 8 minutes 36 seconds
AI Engineering: Customizing LLMs for Business (Fine-Tuning LLMs with QLoRA & AWS)

AI Engineering: Customizing LLMs for Business (Fine-Tuning LLMs with QLoRA & AWS)

Sources: zerotomastery.io
Master an in-demand skill that companies are looking for: the development and implementation of custom LLMs. In the course, you will learn how to fine-tune open
7 hours 12 minutes 10 seconds
The Basics of Prompt Engineering

The Basics of Prompt Engineering

Sources: newline (ex fullstack.io)
In this course, you will master the basics of Prompt Engineering - one of the key skills in the AI era. Large Language Models (LLMs) can reason, write text...
45 minutes 54 seconds
The Dark Side of AI: Jailbreaking, Injections, Hallucinations & more

The Dark Side of AI: Jailbreaking, Injections, Hallucinations & more

Sources: zerotomastery.io
If we asked you to finish the phrase "AI is...", what would you say? "Delightful"? "Amazing"? After this course, your answer is likely to be...
3 hours 3 minutes 38 seconds