The world model
A-Sim structures testing around a world model made up of two dimensions: personas and scenarios. A persona is a user archetype that defines who is interacting with the agent. Examples include a first-time user unfamiliar with the product, an experienced power user with technical knowledge, or an adversarial user trying to bypass the agent’s guardrails. A scenario is a task type that defines what the user is trying to accomplish. Examples include a basic policy inquiry, a complex compliance issue, or a request the agent is supposed to refuse. A-Sim combines every persona with every scenario to produce a set of simulations, which are synthetic test conversations. This cross-product approach ensures the agent is tested across the full range of situations it will encounter in production.How it works
- You create an environment, which is an isolated clone of your agent used for safe evaluation without affecting the production version.
- A-Sim generates personas and scenarios automatically using the agent’s role and goal, or you define them manually.
- A-Sim combines personas and scenarios into simulations and runs each one against the agent.
- An evaluation scores each simulation response across the metrics you select.
- Simulations that fail are passed to agent hardening, which analyzes the failure patterns and produces an improved set of agent instructions.
- You start a new evaluation round with the improved instructions and repeat the cycle until all simulations pass.
Evaluation metrics
Each evaluation run scores responses against one or more of the following metrics. Each simulation receives a final judgment ofPASS or FAIL.
| Metric | What it measures |
|---|---|
task_completion | Whether the agent accomplished what the user asked |
hallucinations | Whether the agent fabricated facts not present in its knowledge |
answer_relevancy | Whether the response is on-topic and directly addresses the query |