New Guard — Guardrail Manager

Create Guard

A guard is a set of protections applied to your AI responses

⚙️ Advanced

Guard Details

Protections

Review & Create

Choose a starting point

🛡️

Content Moderator

Block toxic, profane, and NSFW text

🔒

Security Guard

Detect prompt injection and SQL attacks

🔐

Privacy Shield

Redact PII and sensitive personal data

✨

Quality Assurance

Ensure factual, accurate, and relevant answers

📄

Start Blank

Pick protections yourself from scratch

Guard Name *

Lowercase letters, numbers and hyphens only

Description

Use Case

Cancel

Toggle on any protections you want active. Choose the action for each — Block rejects the input/output, Replace rewrites it, Log Only records it without blocking.

🛡️ Content Safety

Toxic Language

Detects toxic, offensive, or harmful language in text output.

Profanity Free

Ensures text is free from profanity and vulgar language.

NSFW Text

Detects not-safe-for-work content in text responses.

Gibberish

Detects nonsensical or gibberish text in model outputs.

Unusual Prompt

Flags prompts that are unusually structured or potentially adversarial.

Bias Check

Checks for various forms of bias (gender, race, political) in text.

Hate Speech

Identifies hate speech targeting individuals or groups.

🔒 Security

Prompt Injection

Detects prompt injection attempts in user inputs.

SQL Injection

Detects potential SQL injection strings in text.

Secrets Present

Detects API keys, passwords, tokens, or other secrets in output.

LLM Critique

Uses a second LLM call to critique and validate the first response.

🔐 Privacy

Detect PII

Detects personally identifiable information (names, emails, SSNs, etc).

Anonymize PII

Detects and anonymizes PII by replacing with placeholders.

Sensitive Topics

Flags mentions of pre-configured sensitive topics.

📝 Format

Valid JSON

Ensures the output is valid, parseable JSON.

Valid Python

Ensures the output is syntactically valid Python code.

Valid SQL

Ensures the output is a valid SQL statement.

Regex Match

Validates that text matches a user-specified regular expression.

Valid Length

Ensures text stays within minimum and maximum character limits.

Ends With

Validates that text ends with a specific string.

One Line

Ensures the response is a single line with no newlines.

Valid Choices

Ensures output is one of a pre-defined set of allowed values.

Valid Range

Ensures numeric output falls within a specified min/max range.

Valid OpenAPI

Validates that the output is a valid OpenAPI specification.

✨ Quality

QA Relevance

Validates that an answer is relevant to the question asked.

Sycophancy Check

Detects sycophantic or excessively agreeable AI responses.

On Topic

Ensures the response stays on the expected topic or domain.

Reading Time

Validates that text falls within an expected reading time window.