AI Evals For Engineers & PMs

29h 21m 38s
English
Paid

Course description

Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.

If you encounter questions like these while working with AI:

  1. How to test applications where results are probabilistic and require subjective evaluation?
  2. If I change a prompt, how can I ensure nothing else breaks?
  3. Where should engineering efforts be directed? Is it necessary to test everything?
  4. What to do if there is no data or users - where to start?
  5. Which metrics should be tracked? What tools should be used? Which models should be selected?
  6. Is it possible to automate testing and evaluation? And if yes, how can you trust it?

- then this course is for you.

This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."

Read more about the course

What to Expect

You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.

Course Content

  1. Basics and lifecycle of LLM application evaluation
  2. Systematic error analysis
  3. Building effective metrics and automated evaluation pipelines
  4. Collaborative practices and alignment of evaluation criteria
  5. Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
  6. Monitoring in production and continuous quality evaluation
  7. Organizing an effective human-in-the-loop review process
  8. Cost optimization and query routing


Learning Outcomes

  1. Master the best tools for finding, diagnosing, and prioritizing errors in AI.
  2. Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
  3. Build a "data flywheel" that ensures your AI improves over time.
  4. Learn to automate parts of the evaluation processes and trust them.
  5. Be able to customize AI to your preferences and requirements.
  6. Avoid common mistakes accumulated from the experience of more than 35 AI projects.
  7. Gain practical experience through end-to-end exercises, code, and analysis of real cases.


Watch Online

Join premium to watch
Go to premium
# Title Duration
1 1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation 56:41
2 2. Lesson 2. Systematic Error Analysis 01:01:39
3 3. Braintrust Tutorial w Wayde Gilliam 43:03
4 4. Optional. Office Hours 01:40:14
5 5. Lesson 3. More Error Analysis & Collaborative Evaluation 59:34
6 6. Lesson 4. Automated Evaluators 01:00:35
7 7. Taming diffusion QR codes with evals and inference-time scaling w Charles Frye 44:43
8 8. 10x Your RAG Evaluation by Avoiding these Pitfalls w Skylar Payne 28:26
9 9. Optional. Office Hours 01:18:26
10 10. Optional. Office Hours 47:12
11 11. Lesson 5. More Automated Evaluators 05:13
12 12. Lesson 6. RAG & Complex Architectures 59:46
13 13. Scaling Inference-Time Compute for Better LLM Judges w Leonard Tang 31:09
14 14. Building custom eval tools with coding agents w Isaac Flath 46:39
15 15. From Vibe Checks to Evals to Feedback Loops - Case Studies in Al System Maturities w David Karam 30:03
16 16. A Playbook For Building Al Agents You Can Trust w Udi Menkes 38:26
17 17. Al Evals in Vertical Industries (such as healthcare, finance and law) w Dr Chris Lovejoy 34:16
18 18. Arize Phoenix tutorial W Mikyo King 49:03
19 19. Optional. Office Hours 22:32
20 20. Optional. Office Hours 24:20
21 21. Optional. Office Hours 55:49
22 22. Lesson 7. Efficient Continuous Human Review Systems 59:03
23 23. Lesson 8. Cost Optimization 01:03:11
24 24. Techniques for evaluating agents w SallyAnn DeLucia (Arize) 33:38
25 25. LangSmith Tutorial w Harrison Chase 48:24
26 26. From Noob to 5 Automated Evals in 4 Weeks (as a PM) w Teresa Torres 01:10:21
27 27. Solvelt. The Thinking Developer's Environment w Jeremy Howard & Johno Whitaker 01:42:26
28 28. Testing Real Al Products LIVE w Robert Ta 01:00:49
29 29. Fireside Chat with DSP Creator w Omar Khattab 45:00
30 30. Optional. Office Hours 01:06:31
31 31. Optional. Office Hours (Bonus) 01:05:26
32 HW 1&2 walkthrough with Braintrust (pre-recorded) 1 10:50
33 HW 1&2 walkthrough with Braintrust (pre-recorded) 2 05:13
34 HW 1&2 walkthrough with Phoenix (pre-recorded) 15:04
35 HW 1&2 walkthrough with LangSmith (pre-recorded) 22:41
36 HW 3 walkthrough with Braintrust (pre-recorded) 21:41
37 HW 3 walkthrough with Phoenix (pre-recorded) 16:40
38 HW 4 walkthrough with Braintrust (pre-recorded) 23:11
39 HW 4 walkthrough with Phoenix (pre-recorded) 16:39
40 HW 5 walkthrough with Braintrust (pre-recorded) 22:03
41 HW 5 walkthrough with Phoenix (pre-recorded) 14:58

Books

Read Book AI Evals For Engineers & PMs

#Title
1AIE - Braintrust Intro
2Lesson 1
3Lesson 2
4Lesson 3
5Lesson 4
6Lesson 5
7Lesson 6
8Lesson 8
9LLM Evals Course Notes July

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Build a Simple Neural Network & Learn Backpropagation

Build a Simple Neural Network & Learn Backpropagation

Sources: zerotomastery.io
Learn backpropagation and gradient descent by writing a simple neural network from scratch in Python - without libraries, just the basics. Ideal...
4 hours 34 minutes 9 seconds
The Complete AI Fast Track Bootcamp - 2024

The Complete AI Fast Track Bootcamp - 2024

Sources: Code4Startup (coderealprojects)
The Complete AI Fast Track Bootcamp - 2024 is an intensive online course designed for the rapid acquisition of key skills in the field of artificial intelligenc
10 hours 59 minutes 1 second
AI Agents

AI Agents

Sources: Mckay Wrigley (takeoff)
Learn everything you need to create your own AI agents - from basic principles to practical implementation. We'll cover how to design, configure, and...
3 hours 36 minutes 22 seconds
Build and Deploy a Lovable Clone

Build and Deploy a Lovable Clone

Sources: Code With Antonio
In this course, you will create an AI platform for generating applications from scratch. You will learn how to build fully functional full-stack applications...
10 hours 34 minutes 16 seconds