Creating a policy
- Select Safety and Evaluations > Responsible AI in the sidebar.
- Select Create New Policy and give it a name.
- Enable and configure the checks you need across the available categories.
- Select Save in the top right.
- Select Start Testing in the right panel to validate the policy against sample interactions.
AWS Bedrock Guardrails
Lyzr supports connecting AWS Bedrock Guardrails as an external content governance layer. This option requires your own AWS credentials and is not enabled by default.
| Filter | What it blocks |
|---|---|
| Sexual Content | Sexually explicit material |
| Violence | Violent content |
| Hate Speech | Hate speech and discrimination |
| Insults | Insulting language |
| Misconduct | Content promoting illegal activities |
| Prompt Attack | Prompt injection attempts |

Toxicity detection
Lyzr validates every LLM output for toxicity before it reaches the user. The system scores responses between 0 and 1. Responses above the configured threshold are blocked, and the LLM is asked to regenerate until a safe response is produced. Default threshold: 0.4. Values closer to 1 allow more content through; lower values are stricter.
Prompt injection protection
Lyzr checks every incoming user message for prompt injection attempts before it is sent to the LLM. The system assigns a risk score from 0 to 1. Messages above the threshold are blocked before they reach the model. Default threshold: 0.3. Lower values are stricter.
Secrets detection
Lyzr automatically detects and redacts sensitive credentials from both inputs and outputs. Detected values are masked before being stored, displayed, or transmitted. Covered: API keys, authentication tokens, JWTs, private keys, and certificate data.Allowed topics
Restrict the agent to responding only to queries within explicitly approved topic domains. Configure by providing comma-separated values:Banned topics
Prevent the agent from discussing specific prohibited topics. Configure by providing comma-separated values:NSFW detection
Detects and blocks not-safe-for-work or inappropriate content before it is processed or returned.
- Sentence-by-sentence: scans each sentence individually for higher precision.
- Full text: evaluates the entire response as a whole for contextual detection.
Keyword management
Block or redact specific words and phrases from both inputs and outputs.

- Literal: exact substring match.
- Regex: regular expression for format-based patterns.
- Cucumber: parameter extraction for advanced logical matching.
- Blocked: the interaction stops if the keyword is detected.
- Redacted: the keyword is masked and the conversation continues.
Personally Identifiable Information (PII)
Configure how the agent handles each category of personal data. Each type can be independently set to Disabled, Blocked, or Redacted.| Data type | Description |
|---|---|
| Credit card numbers | 13 to 16 digit card number patterns |
| Email addresses | Standard email format |
| Phone numbers | International and local formats |
| Names (person) | Common personal name patterns |
| Locations | City, state, country, address |
| IP addresses | IPv4 and IPv6 |
| Social Security Numbers | U.S. SSN format XXX-XX-XXXX |
| URLs | Standard web address patterns |
| Dates and times | Temporal references and specific dates |
Use case reference
| Use case | Checks to enable |
|---|---|
| Customer support chatbot | Toxicity, Secrets, PII (email, phone) |
| Internal HR agent | Allowed Topics (HR/policy), Keywords (names/projects), PII (SSN, names) |
| Public-facing financial assistant | Prompt Injection, Banned Topics (politics), URL redaction, Credit card blocking |
| Legal document Q&A | Secrets, Credit card blocking, Topic control |