shell.exec,
queries a database, fetches a URL, dispatches a tool through an MCP server.
Each of those is a real-world action that prompt-level guardrails
can’t see. The agent firewall is the action-layer plane that governs
them: a workspace-scoped, named policy the gateway evaluates on every tool
call, before it reaches the tool.
This page is the hub for the Firewall section — the policy model, the
verdicts, the surfaces, and how a policy attaches to a key. Each spoke goes
deeper, and the full engine reference lives in
Firewall and Firewall Rules.
1. What the agent firewall does
You author a policy once in your workspace console, attach an API key to it (or set one as the workspace default), and from then on every tool call that key issues is checked against the policy at the gateway. No redeploy, no agent-code change — your agent keeps issuing tool calls exactly as before, and editing the policy takes effect on the next call for every key attached to it. A policy is an ordered list of rules. Each rule decides which tool calls it applies to and what to do about them. The engine walks the rules in priority order, first match wins, and falls back to the policy’s default verdict if nothing matches.Detection happens at the gateway, on first use — not at install time. A
tool, MCP server, or skill an agent self-installs is caught the first time
its call crosses the gateway. That’s the one choke point that sees every
provider, every agent, and every tool call regardless of how the capability
got there.
2. A concrete example
Suppose you want to block destructive shell commands but let everything else through under audit. In the console you create a policy withdefault_verdict = audit and one rule:
args_match_json is a JSON-encoded string (the gateway validates it
against the clause schema on save): path is a JSONPath into the call’s
arguments, op is one of eq, contains, regex, in, cidr_match,
gt, lt.
Attach a key to the policy (set firewall_policy_id on the key). Now when
an agent emits shell.exec with command: "rm -rf /", the gateway returns
HTTP 400 with error code firewall_blocked and a reason naming the
tool — and the call never reaches the shell. Every other tool call is
allowed and recorded for review.
3. Policy, rules, and resolution
A policy is workspace-scoped and named, withenabled, is_default, a
default_verdict (allow / audit / deny, default audit), and a
shadow_mode flag. A rule is one check inside it — see
Create a policy and
Rule schema.
For any tool call the gateway resolves the active policy in order:
- Key attachment — the calling key’s
firewall_policy_id, when that policy exists and is enabled. - Workspace default — otherwise the enabled
is_defaultpolicy.
firewall_observe_mode setting: with observe mode on, the call is
allowed but logged as a coverage gap (it surfaces in Discovered Tools);
with it off, the call is allowed silently.
4. Verdicts
A rule — or the default — produces one of:| Verdict | What it does |
|---|---|
allow | Let through, logged. |
audit | Allow + record for review. The usual default. |
deny | Block. HTTP 400 firewall_blocked on inbound; tool error on MCP. |
sanitize | Redact matched substrings from the tool arguments and forward. |
pending_approval | Hold for a human; HTTP 400 firewall_approval_pending. |
cap_cost | Deny once the run’s spend crosses a per-rule cap. |
sanitize verdict redacts call arguments only — never the content a
tool returns. On the inbound surface, where there are no call-time args
yet, sanitize escalates to a block. See
Verdicts and
Sanitize responses.
5. The four enforcement surfaces
Every tool call is evaluated against exactly one surface — pin a rule to one with thestage field, or leave it empty to apply to all.
inbound
The tools an agent advertises to the model on the request. Block a
dangerous tool before the model can even choose it.
response
The
tool_calls the model emits in its reply.mcp
A
tools/call dispatched through the MCP gateway. See
MCP servers.egress
An outbound host / IP / CIDR a tool reaches — the SSRF and
data-exfiltration surface.
6. Matching
Rules express which tool calls they catch with a small, deterministic vocabulary that’s safe on the hot path:Tool & skill name globs
Tool & skill name globs
A case-sensitive glob on the tool name (
shell.*, *.delete),
optionally AND-ed with a glob on the owning skill. See
Glob syntax and
Tool allow-listing.Argument clauses
Argument clauses
JSONPath predicates over the call’s arguments with operators
eq,
contains, regex, in, cidr_match, gt, lt — the difference
between “block shell.exec” and “block it only when the command is
rm -rf.” Clauses fail closed (the rule, not the request). See
Validate arguments.Egress lists
Egress lists
Host / IP / CIDR allow and deny lists on the
egress surface. You can
author your own deny rule for cloud-metadata or RFC-1918 ranges. See
Egress control.Sequences & cost caps
Sequences & cost caps
A
sequence rule matches an ordered chain of calls across a window
(enforced reactively); a cap_cost rule denies once an agent run’s
accumulated spend crosses a cents ceiling. See
Sequence rules and
Cap cost.7. Human approval, autonomy, and anomalies
- Human-in-the-loop. A
pending_approvalverdict holds the call and returns its approval id. A reviewer resolves it in the console (Developer+) or via an HMAC-signed webhook callback; the agent polls and re-submits with a single-useX-OrcaRouter-Firewall-Approvalheader. See Approvals and Approval webhook. - Autonomy levels. One switch sets your whole posture:
tight(default-deny + deny destructive shell + deny fetch-shaped tools (http_fetch/web_search/fetch_url/request, the SSRF vector) + PII Shield + Secrets Blocker enforced),balanced(default audit, deny destructive shell, PII Shield audit-only), orpermissive(observe only). Each writes real, editable policy and guardrail rows, with one-click undo from the audit snapshot. - Anomaly detection. Beyond static rules, the firewall scores tool use
against a learned hour-of-week baseline (14-day) and flags rate/cost
spikes,
retry_loop, andnovel_pathon a Member-readable feed you can snooze for up to 7 days.
8. Where the firewall fits
The firewall is the action-layer peer of two adjacent planes:| Plane | Governs | When to reach for it |
|---|---|---|
| Guardrails | Prompt & response text | PII, secrets, jailbreaks, injection intent |
| Agent firewall | Tool actions | Which tools, MCP calls, hosts, and cost |
| Compliance | Evidence & frameworks | SOC 2 / HIPAA / EU AI Act readiness |
9. Attaching and connecting
A policy binds to a key viafirewall_policy_id (configured in the
console; the binding lives on the key in the gateway). Two ways a tool call
reaches the engine, both requiring a firewall-gateway-scoped key
(is_firewall_gateway = true) — a regular relay key gets 403 on these
routes:
- MCP gateway — point your MCP client at the unified
ANY /api/v1/firewall/mcpendpoint; everytools/callis evaluated inline. See MCP servers. - Evaluate hook — call
POST /api/v1/firewall/evaluate(or/evaluate_planfor a multi-step plan) from your own agent loop before dispatching, and act on the verdict.
/api/workspace/firewall/*. Reads of policies, settings, discovered
tools, the read-only autonomy simulator, and the anomaly feed are open to
every workspace member; the dry-run Test sandbox, the Events / Runs
log, and all writes require Developer+. See
Gateway keys and
Test rules.
10. Threats this plane addresses
Dangerous tool calls
Deny destructive shell, DB drops, and risky verbs by glob + args.
Data exfiltration
Egress lists and read-then-export sequence rules.
MCP tool poisoning
Per-call evaluation on the
mcp surface before dispatch.Excessive agency
Approvals, cost caps, and default-deny posture.
Next steps
Create a policy
Author your first policy and rule.
Verdicts
What each verdict does on the wire.
Events log
Read what the firewall decided and why.
