> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lyzr.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent Evaluation

> Systematically test agents before production deployment.

# Agent Evaluation

Agent Eval generates test cases, runs them, scores results across metrics, and identifies failures for remediation.

## Core concepts

**Environments** are named test configurations for one agent.

**Scenarios** are situations the agent should handle.

**Personas** are simulated user types that make test inputs realistic.

**Test cases** combine scenario and persona into an input and expected outcome.

## Metrics

* Task completion
* Hallucination
* Bias
* Toxicity
* Faithfulness
* Reflection
* LLM-as-a-Judge
* Tool call accuracy
* KB retrieval precision

## Workflow

```text theme={null}
Create environment
  -> generate scenarios and personas
  -> generate test cases
  -> select metrics
  -> run tests
  -> review scores
  -> improve the agent
```

## Agent Hardening

Agent Hardening analyzes failed test cases and recommends changes to instructions, model selection, tools, memory, or guardrails.