/v1/* call that key makes is screened — before the model sees the
prompt and after the model answers — with no redeploy and no SDK change.
This page is the hub for the Guardrails section: what a guardrail is, the
rule types, the stages and actions, and how a policy attaches to a key.
Each spoke goes deeper. For the full engine reference, see
Guardrails.
1. What ai guardrails do on the gateway
Most teams reach for guardrails to keep sensitive data out of prompts (PII, secrets), to gate unsafe content (jailbreaks, prompt-injection intent), or to satisfy a compliance control. A guardrail is the gateway’s answer: a workspace-scoped, named policy — an ordered list of rules the gateway runs against request input and model output. Because the binding lives on the API key in the gateway — not in your application — editing a guardrail shifts every attached key on the next call. Your code keeps calling/v1/chat/completions exactly as
before.
Guardrails are content policy (text in, text out). The companion
Agent Firewall is tool policy — it governs
which tool calls an agent may make. The two compose; see
Guardrails vs. firewall.
2. One concrete example
Create a guardrail namedpii-shield in the console
(/console/guardrails), add a single PII rule — stage input, action
mask, entities email, ssn — and attach it to a key. From then on:
Reply to [EMAIL] please before
forwarding — the upstream model never sees the address. Flip that ssn
entity to block and the next request carrying an SSN is rejected with
HTTP 400. No application change.
3. Rules: type, stage, action
Every rule answers three questions. The engine runs all applicable rules and folds them into one decision.Type — what to look for
Type — what to look for
Seven rule types. The built-ins are deterministic (pure string/regex,
no network); the advanced ones call out to a model or vendor and run
concurrently.
keyword— literal denylist, case-insensitive substring match.regex— an RE2 pattern (linear-time, no backreferences).pii— built-in entity detectors plus your own. See §5.max_chars— caps the character count at a stage.external— delegates to a connected vendor (Aporia, Averta, or your own webhook).llm_judge— a semantic check against a model in your workspace.grounding— scores answer faithfulness against the request’s retrieved sources (RAG).
Stage — where to look
Stage — where to look
input (the request), output (the model’s response), or both.
Input rules run before the upstream call; output rules run after the
model responds. See input stage
and output stage.Action — what to do
Action — what to do
Five actions surface in the rule builder:
- block — reject the call with HTTP 400.
- mask — redact the match and let the sanitized text through.
- flag — change nothing about the traffic; record the match only.
- annotate — leave the text alone but inject a security note upstream (e.g. a CVE advisory before the model answers).
- spotlight — wrap the matched untrusted text in delimiters and tell the model to treat it as data, not instructions.
4. How a guardrail attaches and resolves
A guardrail binds to a key viaguardrail_id, or a workspace can mark one
guardrail as its default. For any request the gateway resolves in this
order:
- Explicit attachment — if the key’s
guardrail_idpoints at a guardrail that exists and is enabled, that one applies. An explicit attachment never falls back: disabling it is the off switch. - Workspace default — if the key has no attachment, the enabled default guardrail applies.
- Neither — no enforcement; the request is byte-identical to a workspace that never turned the feature on.
This differs from the firewall. A disabled attached firewall policy
falls back to the workspace default; a disabled attached guardrail
goes to none. The off switch is literal for guardrails.
5. PII detectors
Apii rule ships a closed set of built-in detectors:
email, phone, credit_card, ssn, ip, iban, mac_address,
jwt, aws_access_key, api_key_openai, bitcoin_address — plus the
regional jp_mynumber, kr_rrn, and cn_resident_id.
On a mask action each match becomes a typed tag — an email renders as
[EMAIL], an SSN as [SSN]. You can layer up to 25 custom entities
per rule (a regex with optional Luhn checksum), and route different
entities to different actions in one rule via per-entity overrides.
6. The preset picker
New guardrail opens into a template. Presets are authored server-side, so the console, the sandbox, and these docs describe the same behavior. The picker groups them into categories:| Category | Example presets | Spoke |
|---|---|---|
| pii / secrets | PII Shield, secret-credential blockers | block secrets |
| safety | prompt-injection, jailbreak, self-harm | prompt injection |
| compliance | GDPR, PCI, HIPAA, compliance logger | compliance logger |
| brand / cost | profanity, competitor mentions, size caps | brand safety · cost |
| agent | URL / shell-tool / SQL-in-output filters | agentic |
| code_security | secret-file blocks, copyleft-license review | code security |
7. When a guardrail blocks
A blocked request returns HTTP 400 with error codeguardrail_blocked
and a message naming the guardrail and rule that fired.
- No quota is charged. An input-stage block fires before metering; an output-stage block refunds the pre-consumed quota.
- The request is marked skip-retry — re-running the same prompt would just block again, so the gateway won’t waste a retry on another channel.
8. After it’s live
Matches feed
Every rule that fires records type, action, stage, and detail. Group,
filter, export, and drill into a single match.
Logging & privacy
The matched substring is recorded only when Log raw content is on
— off by default, the privacy-conservative posture.
Versioning
Every change writes a history row. Diff any two versions and revert as
a new version — history is never mutated.
Testing & eval
A sandbox Test tab evaluates the current policy with no upstream
call, and an eval harness scores it against bundled or custom corpora.
9. Where to go next
Pick the right rule type
Pick the right rule type
Understand the model
Understand the model
Map to threats
Map to threats
Full engine reference
Full engine reference
Guardrails — every field, every route, the
LLM-judge and grounding rules, and external vendors in depth.
