Security error codes

When a guardrail or the firewall stops a request, OrcaRouter returns a typed error your code can branch on — not a vague 400. Three security codes cover the cases you’ll see: a screened prompt or response, a denied tool call, and a tool call held for human approval. This page is the reference for those codes — the use-case for each, the exact HTTP status, what it costs you, and the one rule that matters most: retry logic must special-case them. All three are marked skip-retry; blindly re-running the same call just trips the same control again.

These are enforcement codes — the gateway deciding not to forward your call. They are distinct from upstream provider errors (a model 429, a context overflow) and from auth failures. For why a specific request was stopped, see Why was this blocked?.

1. The llm security error codes at a glance

Every security block returns HTTP 400 with an OpenAI-shaped error body (error.code is the typed string below). On native Claude (/v1/messages) routes the same code travels in the Claude error shape, so SDK routing is deterministic across protocols.

Code	Stops	Quota cost
`guardrail_blocked`	A prompt or response that hit a `block` rule	None
`firewall_blocked`	A denied tool call / advertisement	No model tokens
`firewall_approval_pending`	A tool call held for a human reviewer	No model tokens

Branch on error.code, never on the message string. Messages name the specific guardrail, rule, or tool and will change; the codes are a stable contract.

2. `guardrail_blocked` — a screened prompt or response

Returned when a guardrail rule with the block action fires — a denylisted keyword, a regex hit, a PII or secret entity you chose to block rather than mask, an llm_judge verdict, or a failed grounding check. HTTP 400. The message names the guardrail and the rule that fired.

Quota impact: none

A blocked request costs no quota. An input-stage block fires before metering, so nothing is ever billed. An output-stage block runs after the model responds, so the gateway refunds the pre-consumed quota before returning the error. Either way you pay nothing for a blocked call.

Why it's skip-retry

The verdict is a property of the content, not the channel. Re-running the same prompt — even against a different model — produces the same block. Fix the input (or the policy) instead of retrying.

What a mask looks like instead

mask rules do not return this code. A masked match (e.g. jane@acme.com → [EMAIL]) is redacted in place and the call proceeds normally — you get a 200, just with the sensitive span removed. Only the block action surfaces guardrail_blocked. (flag changes nothing about the traffic at all.)

{
  "error": {
    "type": "openai_error",
    "code": "guardrail_blocked",
    "message": "request blocked by guardrail \"pii-shield\": rule ssn (block)"
  }
}

For the rule types, stages, and actions behind this code, see Guardrails. For the field-by-field error envelope, see Webhook & error payloads.

3. `firewall_blocked` — a denied tool call

Returned when the firewall resolves a deny verdict for a tool call — a destructive shell command, an SSRF-shaped fetch, an egress destination on a deny list, or a skill in block mode. How the deny surfaces depends on the enforcement surface:

inbound / response / egress

HTTP 400 with error.code = firewall_blocked. The body carries structured error.metadata (reason_code, risk factors, risk_score) so you can explain the block, not just see it.

mcp surface

Returned as a tool error (firewall deny: <reason>), not a transport failure — so the model sees the rejection and can pick another tool, ask the user, or stop, instead of crashing the run.

sanitize is not a block. A sanitize verdict redacts matched substrings from the tool-call arguments and forwards the cleaned call — it never returns firewall_blocked. (The one exception: on the inbound surface, where there are no call-time arguments yet, sanitize escalates to a deny.)

{
  "error": {
    "type": "openai_error",
    "code": "firewall_blocked",
    "message": "tool \"shell.exec\" blocked by firewall: denied tool",
    "metadata": {
      "reason_code": "FW-TOOL-001",
      "risk_score": 92,
      "factors": ["denied_tool"]
    }
  }
}

Quota-wise, an inbound block fires before the upstream model call, so it costs no model tokens. See Verdict glossary for every verdict, and Dangerous tool calls for the threats this code defends against.

4. `firewall_approval_pending` — held for a human

Returned the instant a tool call hits a pending_approval verdict. A human-in-the-loop gate can’t be a blocking inline wait, so the gateway returns a held response immediately rather than long-polling. HTTP 400. The error carries the approval id so your agent knows which hold to resolve. This is the one code you respond to by resolving and re-submitting — not by treating it as a terminal failure:

Read the approval id off the held error

The id is recoverable from the error body. Don’t retry the call yet — a naive retry just re-holds.

Wait for a decision

A reviewer resolves it from the console (Developer+), or your approval system gets an HMAC-signed webhook callback. Your agent polls GET /api/v1/firewall/approvals/:id for the state.

Re-submit with the approval token

Once approved, re-issue the original call carrying the single-use X-OrcaRouter-Firewall-Approval header. The gateway recognizes the id and lets that one call through.

The approval routes (/api/v1/firewall/approvals/*) run on a firewall-gateway-scoped key, not your console session. See Human approval (HITL) for the full loop and Webhook payloads for the callback signature.

5. Why all three skip retry

Standard SDK retry logic assumes a 400 might succeed on a second try. These codes break that assumption — the block is deterministic, so a blind retry wastes a round trip and (for held calls) silently re-queues an approval.

What 'skip-retry' means in practice

OrcaRouter’s own internal retry/fallback machinery never re-attempts a call that returns one of these codes against another channel. Mirror that in your client: on a security code, stop and act on the verdict, don’t loop.

The right reaction per code

guardrail_blocked → fix the input or relax the policy; surface the refusal to the user. Don’t retry.
firewall_blocked → the action is disallowed; have the agent choose a different tool or ask for help. Don’t retry.
firewall_approval_pending → resolve the hold, then re-submit once with the approval header (§4). A retry without the header re-holds.

6. Quota & billing summary

A security block never bills you for the blocked unit of work.

Code	When it fires	Billing outcome
`guardrail_blocked` (input)	Before the model call	Never metered
`guardrail_blocked` (output)	After the model responds	Pre-consumed quota refunded
`firewall_blocked` (inbound)	Before the model call	No model tokens
`firewall_approval_pending`	Before dispatch	No model tokens

A guardrail’s llm_judge or grounding rule does call a model to reach its verdict, and those judge tokens are billed as a separate judge sub-line — even when the verdict is a block. That’s the cost of the check, not of the blocked request itself.

Why was this blocked?

Trace a single block to the exact rule, surface, and reason that produced it.

Verdict glossary

Every firewall verdict — allow, audit, deny, sanitize, pending_approval, cap_cost — and what each emits.

Webhook & error payloads

The full error envelope, error.metadata fields, and the approval-callback signature.

Enforcement modes

Shadow, observe, and enforce — when a verdict actually changes traffic.

For the controls that produce these codes, see Guardrails and Firewall; for the vocabulary, see the Concepts glossary.

​1. The llm security error codes at a glance

​2. guardrail_blocked — a screened prompt or response

​3. firewall_blocked — a denied tool call

inbound / response / egress

mcp surface

​4. firewall_approval_pending — held for a human

​5. Why all three skip retry

​6. Quota & billing summary

​7. Related references

Why was this blocked?

Verdict glossary

Webhook & error payloads

Enforcement modes

1. The llm security error codes at a glance

2. `guardrail_blocked` — a screened prompt or response

3. `firewall_blocked` — a denied tool call

4. `firewall_approval_pending` — held for a human

5. Why all three skip retry

6. Quota & billing summary

7. Related references