Create Guard
A guard is a set of protections applied to your AI responses
Choose a starting point
Content Moderator
Block toxic, profane, and NSFW text
Security Guard
Detect prompt injection and SQL attacks
Privacy Shield
Redact PII and sensitive personal data
Quality Assurance
Ensure factual, accurate, and relevant answers
Start Blank
Pick protections yourself from scratch
Lowercase letters, numbers and hyphens only
Toggle on any protections you want active. Choose the action for each โ Block rejects the input/output, Replace rewrites it, Log Only records it without blocking.
๐ก๏ธ Content Safety
Toxic Language
Detects toxic, offensive, or harmful language in text output.
Profanity Free
Ensures text is free from profanity and vulgar language.
NSFW Text
Detects not-safe-for-work content in text responses.
Gibberish
Detects nonsensical or gibberish text in model outputs.
Unusual Prompt
Flags prompts that are unusually structured or potentially adversarial.
Bias Check
Checks for various forms of bias (gender, race, political) in text.
Hate Speech
Identifies hate speech targeting individuals or groups.
๐ Security
Prompt Injection
Detects prompt injection attempts in user inputs.
SQL Injection
Detects potential SQL injection strings in text.
Secrets Present
Detects API keys, passwords, tokens, or other secrets in output.
LLM Critique
Uses a second LLM call to critique and validate the first response.
๐ Privacy
Detect PII
Detects personally identifiable information (names, emails, SSNs, etc).
Anonymize PII
Detects and anonymizes PII by replacing with placeholders.
Sensitive Topics
Flags mentions of pre-configured sensitive topics.
๐ Format
Valid JSON
Ensures the output is valid, parseable JSON.
Valid Python
Ensures the output is syntactically valid Python code.
Valid SQL
Ensures the output is a valid SQL statement.
Regex Match
Validates that text matches a user-specified regular expression.
Valid Length
Ensures text stays within minimum and maximum character limits.
Ends With
Validates that text ends with a specific string.
One Line
Ensures the response is a single line with no newlines.
Valid Choices
Ensures output is one of a pre-defined set of allowed values.
Valid Range
Ensures numeric output falls within a specified min/max range.
Valid OpenAPI
Validates that the output is a valid OpenAPI specification.
โจ Quality
QA Relevance
Validates that an answer is relevant to the question asked.
Sycophancy Check
Detects sycophantic or excessively agreeable AI responses.
On Topic
Ensures the response stays on the expected topic or domain.
Reading Time
Validates that text falls within an expected reading time window.
Similar to Document
Checks that response is semantically similar to a reference document.
Response Evaluator
Uses an LLM to evaluate response quality against custom criteria.
๐ผ Business
Competitor Check
Flags if a competitor brand is mentioned in the response.
Restrict to Topic
Limits responses to a specific business domain or topic area.
Translation Quality
Validates the quality of translated text using back-translation.
URL Reachability
Checks that URLs in the response are valid and reachable.
โ๏ธ Custom Validators
persona_policy_validator.py
Custom validator script
prompt_injection_validator.py
Custom validator script
validatorplus.py
Custom validator script
โ Add validator by ID (advanced)
0 protections selected
Guard Summary
Name
โ
Description
โ
Use Case
โ
Protections
0
Active Protections
No protections selected yet.