Create your first guardrail

The fastest way to put a content policy in front of every model call is a guardrail — a workspace-scoped, named policy you author once in the console and attach to an API key. The gateway then screens request input and model output on the next call, with no redeploy and no SDK change. This page walks the end-to-end loop: create a guardrail, add a rule, test it in the sandbox, attach it to a key, and send a real request. For the full engine reference — every rule type, field, and route — see the Guardrails reference.

Every step here is a console action on the hosted gateway (api.orcarouter.ai). Guardrail configuration runs under your own session; only the final /v1/* call uses an sk-orca-... relay key. Creating and editing guardrails requires Developer+ in the workspace.

1. How to add LLM guardrails in five steps

Here is the whole loop at a glance — each step is expanded below.

Create a guardrail

In the console, open Guardrails and click New guardrail. Give it a name (≤ 64 chars), e.g. pii-shield.

Add a rule

Add one PII detection rule at the input stage with the mask action.

Test it in the sandbox

Open the Test tab, paste a sample, and run the policy locally — no upstream call, no quota.

Attach it to a key

Edit an API key and pick the guardrail from the Guardrail dropdown. The binding lives on the key.

Send a request

Call /v1/chat/completions with that key. The gateway applies the policy before forwarding.

2. Create the guardrail

In the console, open Guardrails and click New guardrail. A guardrail is a workspace-scoped, named content policy — an ordered list of rules the gateway runs against request input and model output. Name it pii-shield and save.

The New guardrail split-button also opens straight into a template. The PII Shield preset is a single pii rule that masks email, phone, ssn, credit_card, and ip. Applying a preset is a seed, not a lock — edit it freely afterward. Browse the preset templates for more starting points.

3. Add a rule

Each rule decides three things — what to look for (a rule type), where to look (a stage), and what to do (an action). Add one rule:

Type: PII detection (pii)
Stage: Input (the request)
Action: Mask — redact the match
Entities: email, phone, ssn

On a mask action, each match is replaced with a typed tag — an email becomes [EMAIL], an SSN becomes [SSN]. The seven rule types (keyword, regex, pii, max_chars, external, llm_judge, grounding) and the five actions (block, mask, flag, annotate, spotlight) are covered in the reference. For this first guardrail, one masking rule is enough.

Masking is live on both stages. Input-stage rules mask the request before the model ever sees it; output-stage rules mask the model’s response — on non-streaming responses and chunk-by-chunk on streaming ones — before the client receives it. Block is enforced on both stages too. If you want to gate model responses, set the rule’s stage to output (or both); see Output-stage rules.

4. Test it in the sandbox

Before attaching the guardrail to any key, prove it does what you expect. Open the Test tab inside the editor, paste a sample, pick the input stage, and run:

Reply to jane@acme.com please

The sandbox evaluates the current policy locally and returns the verdict plus the rendered text:

Reply to [EMAIL] please

Nothing is sent upstream and nothing is metered. For an A/B grid against a corpus of inputs, the Eval harness lives one tab over.

5. Attach it to a key

A guardrail does nothing until a key points at it. Two ways to bind:

Per key

Edit an API key and pick the guardrail from the Guardrail dropdown. This sets guardrail_id on the key. See Attach to a key.

Workspace default

Mark the guardrail as the workspace default so any key without an explicit attachment inherits it. See Account default.

Resolution is explicit and predictable:

Order	What applies
1	The key’s explicit `guardrail_id` (if it exists and is enabled).
2	The workspace default (if the key has no attachment).
3	None — the request is byte-identical to a workspace with no policy.

An explicit attachment never silently falls back. Disabling an attached guardrail is the off switch — it does not drop through to the workspace default. (Firewall policies differ here; see Guardrails vs. firewall.)

6. Send a request

Using a key bound to pii-shield, call OrcaRouter exactly as before — no SDK change, no new headers:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Reply to jane@acme.com please"}
    ]
  }'

The gateway masks the email to [EMAIL] before forwarding — the upstream model never sees the address. Swap the rule’s action to block and the very next request that contains the entity is rejected with HTTP 400 guardrail_blocked. A blocked request costs no quota (an input block fires before metering; an output block refunds the pre-consumed quota) and is marked skip-retry. See the guardrail_blocked error for the full response shape.

7. Where to go next

See what fired

Every rule that fires records a match — type, action, stage, and a detail string. The matched substring is recorded only when Log raw content is on (off by default). See the Matches feed and Logging & privacy.

Mask more than basics

PII detection covers email, phone, credit_card, ssn, ip, iban, mac_address, jwt, aws_access_key, api_key_openai, bitcoin_address (plus regional entities), and you can author your own. See PII Shield, Custom PII entities, and Masking formats.

Catch secrets and injection

Add a Secrets blocker or the Prompt-Injection basics preset — the latter flags common jailbreak phrases for review. To catch injection intent semantically rather than by phrase, add an llm_judge rule alongside it.

Roll back a change

Every edit writes a version history row. Open History to diff and revert. See Versioning.

Gate tool calls, not just text

Guardrails screen content. To govern an agent’s tool calls — deny destructive actions, cap cost, require approval — use the Firewall. Start with Securing AI agents and the dangerous-tool-calls threat.

Read the Guardrails reference for the complete engine — rule fields, external vendors, the eval harness, and the full API — or the security quickstart to wire guardrails and firewall together for an agent baseline.

​1. How to add LLM guardrails in five steps

​2. Create the guardrail

​3. Add a rule

​4. Test it in the sandbox

​5. Attach it to a key

Per key

Workspace default

​6. Send a request

​7. Where to go next

1. How to add LLM guardrails in five steps

2. Create the guardrail

3. Add a rule

4. Test it in the sandbox

5. Attach it to a key

6. Send a request

7. Where to go next