Every step here is a console action on the hosted gateway
(
api.orcarouter.ai). Guardrail configuration runs under your own
session; only the final /v1/* call uses an sk-orca-... relay key.
Creating and editing guardrails requires Developer+ in the workspace.1. How to add LLM guardrails in five steps
Here is the whole loop at a glance — each step is expanded below.Create a guardrail
In the console, open Guardrails and click New guardrail. Give
it a name (≤ 64 chars), e.g.
pii-shield.Test it in the sandbox
Open the Test tab, paste a sample, and run the policy locally —
no upstream call, no quota.
Attach it to a key
Edit an API key and pick the guardrail from the Guardrail
dropdown. The binding lives on the key.
2. Create the guardrail
In the console, open Guardrails and click New guardrail. A guardrail is a workspace-scoped, named content policy — an ordered list of rules the gateway runs against request input and model output. Name itpii-shield and save.
3. Add a rule
Each rule decides three things — what to look for (a rule type), where to look (a stage), and what to do (an action). Add one rule:- Type: PII detection (
pii) - Stage: Input (the request)
- Action: Mask — redact the match
- Entities:
email,phone,ssn
[EMAIL], an SSN becomes [SSN]. The seven rule types
(keyword, regex, pii, max_chars, external, llm_judge,
grounding) and the five actions (block, mask, flag, annotate,
spotlight) are covered in the
reference. For this first guardrail,
one masking rule is enough.
Masking is live on both stages. Input-stage rules mask the request
before the model ever sees it; output-stage rules mask the model’s
response — on non-streaming responses and chunk-by-chunk on streaming
ones — before the client receives it. Block is enforced on both
stages too. If you want to gate model responses, set the rule’s stage
to
output (or both); see
Output-stage rules.4. Test it in the sandbox
Before attaching the guardrail to any key, prove it does what you expect. Open the Test tab inside the editor, paste a sample, pick theinput
stage, and run:
5. Attach it to a key
A guardrail does nothing until a key points at it. Two ways to bind:Per key
Edit an API key and pick the guardrail from the Guardrail
dropdown. This sets
guardrail_id on the key. See
Attach to a key.Workspace default
Mark the guardrail as the workspace default so any key without an
explicit attachment inherits it. See
Account default.
| Order | What applies |
|---|---|
| 1 | The key’s explicit guardrail_id (if it exists and is enabled). |
| 2 | The workspace default (if the key has no attachment). |
| 3 | None — the request is byte-identical to a workspace with no policy. |
6. Send a request
Using a key bound topii-shield, call OrcaRouter exactly as before — no
SDK change, no new headers:
[EMAIL] before forwarding — the upstream
model never sees the address. Swap the rule’s action to block and the
very next request that contains the entity is rejected with HTTP 400
guardrail_blocked. A blocked request costs no quota (an input block
fires before metering; an output block refunds the pre-consumed quota)
and is marked skip-retry. See the
guardrail_blocked error
for the full response shape.
7. Where to go next
See what fired
See what fired
Every rule that fires records a match — type, action, stage, and
a detail string. The matched substring is recorded only when Log
raw content is on (off by default). See the
Matches feed and
Logging & privacy.
Mask more than basics
Mask more than basics
PII detection covers
email, phone, credit_card, ssn, ip,
iban, mac_address, jwt, aws_access_key, api_key_openai,
bitcoin_address (plus regional entities), and you can author your
own. See PII Shield,
Custom PII entities, and
Masking formats.Catch secrets and injection
Catch secrets and injection
Add a Secrets blocker or the
Prompt-Injection basics
preset — the latter flags common jailbreak phrases for review. To
catch injection intent semantically rather than by phrase, add an
llm_judge rule alongside it.Roll back a change
Roll back a change
Every edit writes a version history row. Open History to diff and
revert. See Versioning.
Gate tool calls, not just text
Gate tool calls, not just text
Guardrails screen content. To govern an agent’s tool calls — deny
destructive actions, cap cost, require approval — use the
Firewall. Start with
Securing AI agents and the
dangerous-tool-calls threat.
