Mastering AI Evals
Hands-on workshop teaching you how to test AI outputs properly so your models actually work reliably in production, not just in demos.
You'll work through the specific testing methods that separate AI projects that actually work in production from those that only shine in demos. Peter Hanssens walks you through why standard testing fails for generative AI, then moves into the practical side: defining quality for your use case, building evaluation datasets, and using judge prompts and A/B testing to catch problems before users do. This is a hands-on preview, so bring a laptop and ideally an AI project you're actively working on. Most teams shipping AI features skip this entirely, which is why so many deployments fail quietly. The engineers who master evals ship faster and with far fewer surprises.