AI Evals For Engineers & PMs
Course description
Learn proven methods for quickly improving AI applications. Build AI systems that outperform competitors, regardless of the specific use case.
If you encounter questions like these while working with AI:
- How to test applications where results are probabilistic and require subjective evaluation?
- If I change a prompt, how can I ensure nothing else breaks?
- Where should engineering efforts be directed? Is it necessary to test everything?
- What to do if there is no data or users - where to start?
- Which metrics should be tracked? What tools should be used? Which models should be selected?
- Is it possible to automate testing and evaluation? And if yes, how can you trust it?
- then this course is for you.
This is a practical course for engineers and technical product managers. Ideal for those who know how to program or "enjoy coding by intuition."
Read more about the course
What to Expect
You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.
Course Content
- Basics and lifecycle of LLM application evaluation
- Systematic error analysis
- Building effective metrics and automated evaluation pipelines
- Collaborative practices and alignment of evaluation criteria
- Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
- Monitoring in production and continuous quality evaluation
- Organizing an effective human-in-the-loop review process
- Cost optimization and query routing
Learning Outcomes
- Master the best tools for finding, diagnosing, and prioritizing errors in AI.
- Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
- Build a "data flywheel" that ensures your AI improves over time.
- Learn to automate parts of the evaluation processes and trust them.
- Be able to customize AI to your preferences and requirements.
- Avoid common mistakes accumulated from the experience of more than 35 AI projects.
- Gain practical experience through end-to-end exercises, code, and analysis of real cases.
Watch Online
Watch Online AI Evals For Engineers & PMs
All Course Lessons (41)
| # | Lesson Title | Duration | Access |
|---|---|---|---|
| 1 | 1. Lesson 1. Fundamentals & Lifecycle LLM Application Evaluation Demo | 56:41 | |
| 2 | 2. Lesson 2. Systematic Error Analysis | 01:01:39 | |
| 3 | 3. Braintrust Tutorial w Wayde Gilliam | 43:03 | |
| 4 | 4. Optional. Office Hours | 01:40:14 | |
| 5 | 5. Lesson 3. More Error Analysis & Collaborative Evaluation | 59:34 | |
| 6 | 6. Lesson 4. Automated Evaluators | 01:00:35 | |
| 7 | 7. Taming diffusion QR codes with evals and inference-time scaling w Charles Frye | 44:43 | |
| 8 | 8. 10x Your RAG Evaluation by Avoiding these Pitfalls w Skylar Payne | 28:26 | |
| 9 | 9. Optional. Office Hours | 01:18:26 | |
| 10 | 10. Optional. Office Hours | 47:12 | |
| 11 | 11. Lesson 5. More Automated Evaluators | 05:13 | |
| 12 | 12. Lesson 6. RAG & Complex Architectures | 59:46 | |
| 13 | 13. Scaling Inference-Time Compute for Better LLM Judges w Leonard Tang | 31:09 | |
| 14 | 14. Building custom eval tools with coding agents w Isaac Flath | 46:39 | |
| 15 | 15. From Vibe Checks to Evals to Feedback Loops - Case Studies in Al System Maturities w David Karam | 30:03 | |
| 16 | 16. A Playbook For Building Al Agents You Can Trust w Udi Menkes | 38:26 | |
| 17 | 17. Al Evals in Vertical Industries (such as healthcare, finance and law) w Dr Chris Lovejoy | 34:16 | |
| 18 | 18. Arize Phoenix tutorial W Mikyo King | 49:03 | |
| 19 | 19. Optional. Office Hours | 22:32 | |
| 20 | 20. Optional. Office Hours | 24:20 | |
| 21 | 21. Optional. Office Hours | 55:49 | |
| 22 | 22. Lesson 7. Efficient Continuous Human Review Systems | 59:03 | |
| 23 | 23. Lesson 8. Cost Optimization | 01:03:11 | |
| 24 | 24. Techniques for evaluating agents w SallyAnn DeLucia (Arize) | 33:38 | |
| 25 | 25. LangSmith Tutorial w Harrison Chase | 48:24 | |
| 26 | 26. From Noob to 5 Automated Evals in 4 Weeks (as a PM) w Teresa Torres | 01:10:21 | |
| 27 | 27. Solvelt. The Thinking Developer's Environment w Jeremy Howard & Johno Whitaker | 01:42:26 | |
| 28 | 28. Testing Real Al Products LIVE w Robert Ta | 01:00:49 | |
| 29 | 29. Fireside Chat with DSP Creator w Omar Khattab | 45:00 | |
| 30 | 30. Optional. Office Hours | 01:06:31 | |
| 31 | 31. Optional. Office Hours (Bonus) | 01:05:26 | |
| 32 | HW 1&2 walkthrough with Braintrust (pre-recorded) 1 | 10:50 | |
| 33 | HW 1&2 walkthrough with Braintrust (pre-recorded) 2 | 05:13 | |
| 34 | HW 1&2 walkthrough with Phoenix (pre-recorded) | 15:04 | |
| 35 | HW 1&2 walkthrough with LangSmith (pre-recorded) | 22:41 | |
| 36 | HW 3 walkthrough with Braintrust (pre-recorded) | 21:41 | |
| 37 | HW 3 walkthrough with Phoenix (pre-recorded) | 16:40 | |
| 38 | HW 4 walkthrough with Braintrust (pre-recorded) | 23:11 | |
| 39 | HW 4 walkthrough with Phoenix (pre-recorded) | 16:39 | |
| 40 | HW 5 walkthrough with Braintrust (pre-recorded) | 22:03 | |
| 41 | HW 5 walkthrough with Phoenix (pre-recorded) | 14:58 |
Unlock unlimited learning
Get instant access to all 40 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.
Learn more about subscriptionBooks
Read Book AI Evals For Engineers & PMs
| # | Title |
|---|---|
| 1 | AIE - Braintrust Intro |
| 2 | Lesson 1 |
| 3 | Lesson 2 |
| 4 | Lesson 3 |
| 5 | Lesson 4 |
| 6 | Lesson 5 |
| 7 | Lesson 6 |
| 8 | Lesson 8 |
| 9 | LLM Evals Course Notes July |
Comments
0 commentsSimilar courses

Building Gen AI Agents for Enterprise: Leadership and Product Manager Edition

AI Engineering Bootcamp: Building AI Applications (LangChain, LLM APIs + more)

Master and Build Large Language Models

The Basics of Prompt Engineering

Want to join the conversation?
Sign in to comment