Episode notes
What you’ll learn:
• How reinforcement learning can reduce AI agent error rates by up to 60% and drastically lower inference costs.
• The critical difference between supervised fine-tuning and RL for agentic workflows, and why RL is essential for true agent reliability.
• A practical, code-level walkthrough of building and training an email search agent that outperforms OpenAI’s GPT-3.5 on a 14-billion-parameter open-source model.
• Strategies for generating high-quality synthetic data and designing nuanced reward functions with ‘partial credit’ to effectively train your agents.
• Key use cases where RL fine-tuning delivers the most significant benefits, including real-time voice agents and high-volume applications.
Kyle Corbett is the founder of OpenPipe, a platform dedicated to helping enterprises build ...