Hamel Husain

AI Consultant and Educator at Independent

AI consultant and co-creator of the definitive online course on evals (the number one course on Maven), who has taught over 2,000 PMs and engineers across 500 companies including OpenAI and Anthropic how to systematically measure and improve AI applications.

The Scientist The Growth Scientist

Dimension Profile

Strategic Vision 50%

Execution & Craft 65%

Data & Experimentation 95%

Growth & Distribution 20%

Team & Leadership 30%

User Empathy & Research 50%

Key Themes

evals as the most important AI product skill systematic measurement of AI applications error analysis before test writing benevolent dictator for eval taste vibe checks versus systematic evals open coding for AI quality

Episode Summary

Hamel Husain (alongside Shreya Shankar) breaks down evals from an obscure AI concept into a practical, approachable skill for product builders. He walks through a real-world example of building evals for a real estate AI assistant, explains why you should start with error analysis before writing any tests, introduces the benevolent dictator concept for eval taste, and addresses major misconceptions including why you cannot just have AI evaluate itself. The episode serves as the definitive primer on building evals for AI products.

Leadership Principles

→ Evals is really data analytics on your LLM application — a systematic way of looking at data and creating metrics so you can iterate and improve with confidence
→ Don't jump straight to writing tests — start with error analysis and data analysis to ground what you should even test, because LLMs have much more surface area than traditional software
→ Appoint a benevolent dictator whose taste you trust for eval criteria — it should be the person with domain expertise, often the product manager, not a committee

Notable Quotes

"Evals is a way to systematically measure and improve an AI application. It really is, at its core, data analytics on your LLM application and a systematic way of looking at that data."
— On demystifying what evals actually are

"The top misconception is, 'We live in the age of AI. Can't the AI just eval it?' But it doesn't work."
— On the most common misconception about AI evals

"When you're doing this open coding, a lot of teams get bogged down in having a committee do this. You can appoint one person whose taste that you trust. It should be the person with domain expertise. Oftentimes, it is the product manager."
— On the benevolent dictator approach to eval criteria

Want to know how you compare?

Take the Assessment