Skip to main content
OrcaRouter applies four layers to every request in a fixed order. Each layer is independent, workspace-scoped, and attaches to an API key with no code change. This page walks one request through all four layers in sequence, then covers resolution order and fail-open/fail-closed behavior. For a broader introduction see Securing AI agents with OrcaRouter.

1. Layer 1 — Scoped API key

The key is the first gate. Before any content is inspected or any model is called, the gateway resolves the calling key and decides whether the request is even permitted. What the key carries:
  • model_limits — the set of models the key may call. A request for a model outside the list is rejected immediately.
  • allow_ips — optional IP allowlist. A request from an unlisted source is rejected.
  • credit_limit_usd — a hard spend cap. A request that would push the key over the cap is rejected.
  • expiry — a hard expiry date. Expired keys are rejected.
  • environment — a tag (production, staging, dev, …) to organize and identify the key by deployment environment.
  • guardrail_id — the guardrail policy that binds to this key (see Layer 2 and Layer 4).
  • firewall_policy_id — the firewall policy that binds to this key (see Layer 3).
  • is_firewall_gateway — flags the key as a firewall-gateway-scoped token; required for the evaluate and MCP gateway routes.
A request that fails key validation never reaches a model — and is never metered. Where to configure: Console → API Keys, or the token API. Requires Developer+ to create or edit; is_firewall_gateway and plaintext key reads require Admin. For the full key model see Scope, keys, policies, and workspaces.

2. Layer 2 — Input guardrails

Once the key is validated, the gateway runs the bound guardrail’s input-stage rules against the request text — before any upstream model call. What it sees: The caller’s messages as submitted. (A prompt injected from a registry prompt is appended later, in the routing stage; input rules do not see it.) Rule types available: keyword, regex, pii, max_chars, external, llm_judge, grounding. Actions a rule can produce:
ActionWhat happens
blockRequest is rejected — HTTP 400 guardrail_blocked. No quota is charged. Marked skip-retry.
maskMatch is redacted (e.g. jane@acme.com[EMAIL]). The sanitized text continues to the model.
flagMatch is recorded; traffic is unchanged.
A block at this stage means the model is never called. Cost: zero. The caller sees a structured error naming the guardrail and the rule that fired. Where to configure: Console → Guardrails, or the guardrail API. Requires Developer+ to create or modify. See Guardrails for the full rule reference.

3. Layer 3 — Model runs

If the key is valid and input guardrails pass, the gateway forwards the request to the upstream model. This is the only layer with no configurable enforcement — it is simply the model doing its job. The firewall operates on the actions the model produces (Layer 3 → Layer 4 below), not on the model itself. Routing, fallbacks, and load balancing happen here transparently.

4. Layer 4 — Agent Firewall (tool calls and egress)

After the model responds — or inline, as tool calls are emitted — the Agent Firewall judges every action the model requests. Four enforcement surfaces:
SurfaceWhat the firewall sees
inboundTool definitions the agent advertises to the model. Block a dangerous tool before the model can choose it.
responsetool_calls the model emits in its reply.
mcpA tools/call dispatched through the Firewall MCP gateway or the evaluate hook.
egressAn outbound network destination (host / IP / CIDR) reported by a tool — the SSRF and data-exfiltration surface.
Verdicts a rule (or the default) can produce:
VerdictWhat it does
allowCall proceeds. Logged.
auditCall proceeds; recorded for review. The default default_verdict.
denyCall blocked — HTTP 400 firewall_blocked on the inbound surface; a tool error on mcp.
sanitizeMatched substrings are redacted from tool arguments; the cleaned call proceeds. On inbound (no arguments yet), escalates to deny.
pending_approvalCall is held; an out-of-band reviewer approves or rejects; the agent re-submits with a single-use approval token.
cap_costDeny once the agent run’s accumulated spend exceeds a per-rule cents cap.
A deny on the inbound surface costs no model tokens — the block fires before the upstream call. A pending_approval hold returns HTTP 400 firewall_approval_pending with an approval id the client polls. Where to configure: Console → Firewall, or the firewall API. Requires Developer+ to create or modify policies and rules. See Firewall and Firewall rules for the full rule language.

5. Layer 5 — Output guardrails

After the model responds (and after any tool-call cycle completes), the gateway runs the bound guardrail’s output-stage rules against the response text before it reaches the caller. The same rule types and actions apply as in Layer 2. An output block returns HTTP 400 guardrail_blocked and refunds the pre-consumed quota — the caller pays nothing.
Streaming and output masking. A block action is enforced on both streaming and non-streaming responses — on a stream, the scanner cuts mid-flight and emits a replacement. A mask action on output currently applies only to non-streaming responses; on a streaming response the original chunk passes through unmasked. Verify your stage/stream combination in the guardrail sandbox before depending on it.

6. Layer 6 — Audit

Every match, verdict, and approval decision is written to the audit trail, correlated to the agent run and session that caused it. This is not a separate enforcement step — it runs in parallel with Layers 2–5 — but it is the layer that makes the others accountable. What is logged:
  • Guardrail matches: rule type, action, stage, detail string, and (if Log raw content is enabled) the matched substring.
  • Firewall events: surface, tool name, verdict, matched rule, reason code, risk factors, and the run/session the call belongs to.
  • Approval decisions: who approved or rejected, when, and whether the underlying rule changed between hold and decision.
  • Policy changes: every create, update, delete, and autonomy-level change writes a versioned audit row.
Where to review: Console → Guardrails → Matches; Console → Firewall → Events, Runs & Sessions, Audit. The guardrail Matches feed is open to any workspace Member; the firewall Events and Runs & Sessions feeds require Developer+.

7. Summary table

LayerWhat it controlsWhat it seesOutcome on a hitWhere to configure
1. Scoped keyIdentity, model access, spend, IP, expiryThe request’s auth tokenHTTP 4xx before anything runs; no meteringConsole → API Keys (Developer+)
2. Input guardrailsRequest text contentCaller’s messagesBlock (HTTP 400 guardrail_blocked, no charge), mask, or flagConsole → Guardrails (Developer+)
3. ModelRouting / channel config
4. Agent FirewallTool calls, MCP dispatch, egressTool name, arguments, destinationallow / audit / deny / sanitize / pending_approval / cap_costConsole → Firewall (Developer+)
5. Output guardrailsResponse text contentModel’s replyBlock (HTTP 400, quota refunded), mask, or flagConsole → Guardrails (Developer+)
6. AuditAttribution and trailAll of the aboveImmutable log entryConsole → Matches (Member) / Events & Runs (Developer+)

8. Policy resolution order

For any request, the active guardrail and firewall policy are resolved independently:
  1. Key attachment — if the key carries an explicit guardrail_id or firewall_policy_id, that policy binds (when it exists and is enabled).
  2. Workspace default — if the key has no attachment, the workspace’s enabled is_default guardrail or policy applies.
  3. Neither — no enforcement. The request is byte-identical to a workspace that never enabled the feature.
The two planes differ when an attached policy is disabled: a disabled guardrail attachment turns guardrails off for that key (no fallback), while a disabled firewall attachment falls back to the workspace default firewall policy. At most one guardrail and one firewall policy per workspace can be the default. Promoting a new default demotes the old one in the same transaction.

9. Fail-open and fail-closed

Two behaviors — applied to different cases.Fail-open (transient errors): If policy resolution hits a transient error — a database hiccup, a network blip on the way to an external vendor — the gateway degrades to no enforcement rather than taking traffic down. Safety degrades; availability is preserved. Configure fail_open: false on external or llm_judge rules when a missed check is unacceptable for your policy.Fail-closed (ambiguous cases): Where not enforcing would defeat the rule, the engine fails closed: an egress report with an unresolvable destination is denied; an unreachable approval store holds the call rather than passing it; a skill whose ownership cannot be resolved is blocked. Availability is preserved on the happy path; safety is not silently skipped on the edge cases that matter.
See Enforcement modes for the full decision tree, and How OrcaRouter inspects requests for the relay-path mechanics.

10. Deep dives

Guardrails

Full rule reference — types, actions, PII entities, LLM judge, grounding, and the test sandbox.

Firewall

Policy model, verdicts, surfaces, shadow mode, HITL approval, and anomaly detection.

Firewall rules

The rule matching language — tool globs, argument clauses, egress lists, and sanitizers.

Guardrails vs. Firewall

Which layer catches which threat — and when you need both.

Scope, keys, and policies

The full key model: what a key carries and how policies resolve.

Enforcement modes

Fail-open vs. fail-closed — the full decision tree.
Every call through OrcaRouter passes all four enforcement layers in order — key validation, input screening, firewall judgment, output screening — with a full audit trail written across all of them.