Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
How to Argue With a Language Model (And Win)
Learn structured experimentation for AI agents. Discover A/B testing, dataset curation, and evaluation methods to improve your agent's performance beyond guesswork.
You’ve built an AI agent, it mostly works, and now you’re stuck in a loop of tweaking prompts and hoping for the best. Sound familiar? In this talk we’ll move past vibes-based development and into structured experimentation. We’ll cover how to set up A/B tests for your agents, build and curate datasets from captured interactions or static data, and wire up evals that actually tell you whether your changes made things better or worse.