Agent Evaluation
Agent Eval generates test cases, runs them, scores results across metrics, and identifies failures for remediation.Core concepts
Environments are named test configurations for one agent. Scenarios are situations the agent should handle. Personas are simulated user types that make test inputs realistic. Test cases combine scenario and persona into an input and expected outcome.Metrics
- Task completion
- Hallucination
- Bias
- Toxicity
- Faithfulness
- Reflection
- LLM-as-a-Judge
- Tool call accuracy
- KB retrieval precision