Quick Start
Key Features
Toxicity Detection
Block or filter toxic, harmful, or offensive content:Prompt Injection Protection
Prevent malicious prompt manipulation:PII Detection & Handling
Detect and handle personally identifiable information:Secrets Detection
Prevent API keys, passwords, and other secrets from leaking:Topic Control
Restrict agent to specific topics:NSFW Detection
Filter inappropriate content:How RAI Works
RAI Components
| Component | Purpose | Actions |
|---|---|---|
| Toxicity | Detect harmful/offensive content | Block if threshold exceeded |
| Prompt Injection | Prevent malicious prompts | Block suspicious inputs |
| PII Detection | Find personal information | Block, Redact, or Allow |
| Secrets Detection | Find API keys, passwords | Mask or Block |
| NSFW | Detect inappropriate content | Block if threshold exceeded |
| Topic Control | Enforce topic boundaries | Ban or allow specific topics |
| Keywords | Filter specific words | Block messages with keywords |
PII Types
| Type | Description | Example |
|---|---|---|
CREDIT_CARD | Credit card numbers | 4111-1111-1111-1111 |
EMAIL | Email addresses | user@example.com |
PHONE | Phone numbers | +1-555-123-4567 |
SSN | US Social Security Numbers | 123-45-6789 |
PERSON | Person names | John Smith |
LOCATION | Physical locations | 123 Main St, NYC |
IP_ADDRESS | IP addresses | 192.168.1.1 |
URL | Web URLs | https://example.com |
DATE_TIME | Dates and times | 2024-03-15 |
Actions
PII Actions
| Action | Behavior |
|---|---|
BLOCK | Reject the entire message |
REDACT | Replace PII with [REDACTED] |
DISABLED | No action (allow through) |
Secrets Actions
| Action | Behavior |
|---|---|
MASK | Replace secrets with asterisks |
BLOCK | Reject messages containing secrets |
DISABLED | No action (allow through) |