Skip to main content
RAI (Responsible AI) guardrails protect your agents by filtering harmful content, detecting prompt injection, handling PII, and enforcing topic boundaries.

Quick Start

from lyzr import Studio
from lyzr.rai import PIIType, PIIAction, SecretsAction

studio = Studio(api_key="your-api-key")

# Create a safety policy
policy = studio.create_rai_policy(
    name="SafePolicy",
    description="Standard safety guardrails",
    toxicity_threshold=0.4,
    prompt_injection=True,
    secrets_detection=SecretsAction.MASK,
    pii_detection={
        PIIType.CREDIT_CARD: PIIAction.BLOCK,
        PIIType.EMAIL: PIIAction.REDACT,
        PIIType.SSN: PIIAction.BLOCK
    }
)

# Apply to agent
agent = studio.create_agent(
    name="Safe Bot",
    provider="gpt-4o",
    role="Customer support",
    goal="Help users safely",
    rai_policy=policy
)

# Agent now has guardrails active
response = agent.run("Process my order with card 4111-1111-1111-1111")
# Credit card is blocked, agent asks for alternative payment method

Key Features

Toxicity Detection

Block or filter toxic, harmful, or offensive content:
policy = studio.create_rai_policy(
    name="NoToxicity",
    description="Block toxic content",
    toxicity_threshold=0.3  # Lower = stricter (0.0-1.0)
)

Prompt Injection Protection

Prevent malicious prompt manipulation:
policy = studio.create_rai_policy(
    name="SecurePolicy",
    description="Prevent prompt injection",
    prompt_injection=True
)

PII Detection & Handling

Detect and handle personally identifiable information:
from lyzr.rai import PIIType, PIIAction

policy = studio.create_rai_policy(
    name="PIIPolicy",
    description="Handle PII safely",
    pii_detection={
        PIIType.CREDIT_CARD: PIIAction.BLOCK,
        PIIType.EMAIL: PIIAction.REDACT,
        PIIType.PHONE: PIIAction.REDACT
    }
)

Secrets Detection

Prevent API keys, passwords, and other secrets from leaking:
from lyzr.rai import SecretsAction

policy = studio.create_rai_policy(
    name="SecretsSafe",
    description="Mask secrets",
    secrets_detection=SecretsAction.MASK  # or BLOCK
)

Topic Control

Restrict agent to specific topics:
policy = studio.create_rai_policy(
    name="TopicRestricted",
    description="Stay on topic",
    banned_topics=["politics", "religion", "competitors"],
    allowed_topics={"enabled": True, "topics": ["products", "support", "billing"]}
)

NSFW Detection

Filter inappropriate content:
policy = studio.create_rai_policy(
    name="FamilyFriendly",
    description="Block NSFW content",
    nsfw_check=True,
    nsfw_threshold=0.8
)

How RAI Works

User Input → RAI Pre-check → Agent Processing → RAI Post-check → Response

1. Pre-check: Scan input for toxicity, injection, PII, etc.
2. Processing: Agent generates response (if input passes)
3. Post-check: Scan output for policy violations
4. Response: Clean, safe response delivered to user

RAI Components

ComponentPurposeActions
ToxicityDetect harmful/offensive contentBlock if threshold exceeded
Prompt InjectionPrevent malicious promptsBlock suspicious inputs
PII DetectionFind personal informationBlock, Redact, or Allow
Secrets DetectionFind API keys, passwordsMask or Block
NSFWDetect inappropriate contentBlock if threshold exceeded
Topic ControlEnforce topic boundariesBan or allow specific topics
KeywordsFilter specific wordsBlock messages with keywords

PII Types

TypeDescriptionExample
CREDIT_CARDCredit card numbers4111-1111-1111-1111
EMAILEmail addressesuser@example.com
PHONEPhone numbers+1-555-123-4567
SSNUS Social Security Numbers123-45-6789
PERSONPerson namesJohn Smith
LOCATIONPhysical locations123 Main St, NYC
IP_ADDRESSIP addresses192.168.1.1
URLWeb URLshttps://example.com
DATE_TIMEDates and times2024-03-15

Actions

PII Actions

ActionBehavior
BLOCKReject the entire message
REDACTReplace PII with [REDACTED]
DISABLEDNo action (allow through)

Secrets Actions

ActionBehavior
MASKReplace secrets with asterisks
BLOCKReject messages containing secrets
DISABLEDNo action (allow through)

Use Cases

Customer Support

support_policy = studio.create_rai_policy(
    name="SupportSafe",
    description="Safe customer support interactions",
    toxicity_threshold=0.4,
    prompt_injection=True,
    pii_detection={
        PIIType.CREDIT_CARD: PIIAction.BLOCK,
        PIIType.SSN: PIIAction.BLOCK,
        PIIType.EMAIL: PIIAction.REDACT
    },
    banned_topics=["competitors", "internal_processes"]
)

Healthcare

health_policy = studio.create_rai_policy(
    name="HealthSafe",
    description="HIPAA-compliant interactions",
    toxicity_threshold=0.3,
    pii_detection={
        PIIType.SSN: PIIAction.BLOCK,
        PIIType.PHONE: PIIAction.REDACT,
        PIIType.PERSON: PIIAction.REDACT
    },
    secrets_detection=SecretsAction.BLOCK
)

Children’s Content

kids_policy = studio.create_rai_policy(
    name="KidsSafe",
    description="Safe for children",
    toxicity_threshold=0.2,  # Very strict
    nsfw_check=True,
    nsfw_threshold=0.9,
    banned_topics=["violence", "adult_content", "gambling"]
)

Next Steps