Rules
The matching language — tool globs, argument clauses, egress lists,
sanitizers, and sequences.
MCP servers
Register and govern Model Context Protocol servers behind a single
audited gateway.
Skills
Scan and risk-score the capabilities your agents install before they
can run.
1. What is the Firewall
An AI agent doesn’t just generate text — it acts. It callsshell.exec,
queries db.query, fetches a URL, loads a community skill, or routes a
tool call through a third-party MCP server. Each of those is an action
with real-world consequences, and prompt-level guardrails can’t see them.
The Firewall is a workspace-scoped, named policy that the gateway
evaluates on every tool call. You author a policy once, attach an API key
to it (or set one as the workspace default), and from then on every tool
call that key issues is checked against the policy — before it reaches
the tool.
Each policy is an ordered list of rules. A rule decides one thing —
which tool calls it applies to (a tool-name glob, optionally scoped to a
skill and to an enforcement surface) and what to do about them (a
verdict: allow, audit, deny, sanitize, hold for approval, or cap
cost). The engine walks the rules in priority order, first match wins,
and falls back to the policy’s default verdict if nothing matches.
Editing a policy takes effect on every key attached to it on the next
call. No redeploy. No agent-code change. The policy is enforced at the
gateway — your agent keeps issuing tool calls exactly as before.
Detection happens at the gateway, on first use. The Firewall sits on
the LLM-relay path, not inside your agent’s package manager or
filesystem. A tool, MCP server, or skill an agent self-installs is caught
the first time its call crosses the gateway — not at install time. This is
deliberate: it’s the one choke point that sees every provider, every
agent, and every tool call regardless of how the capability got there.
2. The four enforcement surfaces
Every tool call is evaluated against exactly one surface — the point in the request lifecycle where the firewall sees it:| Surface | What it sees |
|---|---|
inbound | The tools an agent advertises to the model on the request (tool definitions). Lets you block a dangerous tool before the model can even choose it. |
response | The tool_calls the model emits in its reply. |
mcp | A tools/call dispatched through the Firewall MCP gateway or evaluated via the SDK hook. |
egress | An outbound network destination (host / IP / CIDR) reported by a tool — the SSRF and data-exfiltration surface. |
3. Core concepts
| Concept | Definition |
|---|---|
| Policy | A named, workspace-scoped set of rules. Has enabled, is_default, a default_verdict, and a shadow_mode flag. |
| Rule | One check inside a policy: a priority, a tool/skill match, an optional surface, an optional argument predicate, and a verdict. See Rules. |
| Verdict | The action a rule (or the default) produces — see §4. |
| Default verdict | Applied when no rule matches. One of allow, audit (default), or deny. |
| Shadow mode | The policy evaluates and logs but never blocks — every enforcing verdict is downgraded to audit and the reason is prefixed [shadow] would …. Your safe-rollout switch. |
| Observe mode | A workspace-level setting. When a request resolves to no policy and observe mode is on, the call is allowed but logged as a coverage gap — that’s what populates the Discovered-tools view. |
Scoping and resolution
Policies resolve exactly like Guardrails and API keys — workspace-shared when you have an active workspace. For any tool call the gateway resolves the policy in this order:- Key attachment — if the calling key has a
firewall_policy_id, that policy applies (when it exists and is enabled). - Workspace default — otherwise the workspace’s enabled
is_defaultpolicy applies. - Neither — no enforcement. With observe mode on, the call is allowed and logged as a gap; with it off, the call is allowed silently (byte-identical to a workspace that never enabled the feature).
Fail-open on the unknown, fail-closed on the ambiguous. If policy
resolution hits a transient error the gateway degrades to observe/allow
rather than taking traffic down. But where not enforcing would defeat
the rule — an egress report with no usable destination, an approval store
that’s unreachable, a skill whose ownership can’t be resolved — the engine
fails closed (deny or hold). Availability is preserved; safety isn’t
silently skipped on the cases that matter.
4. Verdicts
A rule (or the default verdict) produces one of:| Verdict | What it does |
|---|---|
allow | Let the call through. Logged. |
audit | Allow, but record it for review. The default default_verdict — observe everything, block nothing, until you’re ready. |
deny | Block the call. The agent sees a tool error (or HTTP 400 on the inbound surface). |
sanitize | Redact matched substrings from the tool arguments (secrets, PII) and forward the cleaned call. See sanitizers. On the inbound surface — where there are no call-time args yet — sanitize escalates to a block. |
pending_approval | Hold the call for a human. The agent gets a “held” response; a reviewer approves or rejects out-of-band; the agent re-submits with a single-use approval token. See §7. |
cap_cost | Deny once the agent run’s accumulated spend exceeds a per-rule cents cap. A circuit-breaker for runaway loops. |
deny / sanitize / pending_approval are all downgraded
to audit so you can measure a policy’s impact before it changes traffic.
5. How a tool call is evaluated
- A tool call reaches the gateway (advertised inbound, emitted in a response, dispatched through the MCP gateway, or reported as egress).
- The engine resolves the active policy (§3).
- It walks the policy’s rules in priority order (lower priority first; ties broken by rule id). A rule matches when its surface, its tool-name glob, its optional skill-name glob, its optional argument clauses, and its optional egress scope all match.
- First match wins → the rule’s verdict applies. If no rule matches →
the policy’s
default_verdict. - If the call is owned by a governed skill,
the skill’s enforcement mode is applied on top — a skill in
blockmode forces a deny; a skill inquarantinemode escalates anything short of deny topending_approval. - The decision is logged as a firewall event (unless it’s a dry run), correlated to the agent run and session.
6. What a block looks like
A denied call on the inbound surface returns HTTP 400 with an OpenAI-shaped error body, error codefirewall_blocked, and a message
naming the tool and the reason — e.g. tool "shell.exec" blocked by firewall: destructive shell command. The error carries structured
metadata (reason code, risk factors, score) and is marked skip-retry
(re-running the same call would just block again).
A call dispatched through the MCP gateway is blocked as a tool error
(firewall deny: <reason>) rather than a transport failure, so the model
sees the rejection and can react — pick another tool, ask the user, or
stop — instead of crashing.
A held call (pending_approval) returns HTTP 400 with code
firewall_approval_pending and an approval id the client polls on.
7. Human approval (HITL)
Apending_approval verdict turns a tool call into an out-of-band review:
- The engine enqueues an approval record and returns a “held” response carrying its id; the call does not reach the tool.
- A reviewer resolves it — from the console (Developer+), or via an HMAC-signed webhook callback to your own approval system.
- Your agent (or the MCP SDK) polls the approval id; once approved it
re-submits the original call with a single-use
X-OrcaRouter-Firewall-Approvalheader, and the gateway lets it through that one time.
rule_changed so reviewers
know the context shifted.
8. Autonomy levels: one switch for your whole posture
Tuning policies rule-by-rule is the precise path; autonomy levels are the fast one. A single control atomically replaces your workspace’s Firewall and Guardrails posture in one transaction, with one-click undo:| Level | Posture |
|---|---|
tight | Block destructive shell, secrets in arguments, and SSRF egress (default deny); PII Shield + Secrets Blocker guardrails on; observe mode off. |
balanced | Audit destructive shell, flag PII; observe mode off. The recommended starting posture. |
permissive | No enforcing policy, no guardrails; observe mode on so you still see everything. |
9. Anomaly detection
Beyond static rules, the Firewall learns each workspace’s normal tool-use shape and flags deviations on a viewer-readable feed:- Rate / cost spikes — per-tool activity is scored against a learned
hour-of-week baseline (a 14-day rolling average), so “100
db.querycalls at 3am Sunday” stands out even if each call is individually allowed. retry_loop— an agent hammering the same failing tool.novel_path— a tool-to-tool transition this workspace has never made before.
10. Observability
The Firewall leaves a trail you can act on, all workspace-scoped:| Surface | What it gives you |
|---|---|
| Events | Every evaluation, filterable by verdict, surface, tool, run, and session. The raw record behind everything else. |
| Runs & sessions | Events rolled up by agent run or conversation — verdict breakdown, distinct tools and models, first/last seen. The “what did this agent actually do” view. |
| Discovered tools | Every tool the workspace has seen, flagged covered (a rule applies) or gap (nothing does). Drives policy authoring from real traffic. |
| Simulate | Preview what an autonomy level would change before you apply it. |
| Test | Dry-run a policy against a sample tool call and see the verdict, the matched rule, and the reason — nothing is persisted, nothing is dispatched. |
| Audit | Every policy, rule, and settings change writes an audit row (workspace + central) after the change commits. Secrets and rule blobs are never logged. |
11. Relationship to the rest of the gateway
| Surface | Composes with the Firewall how? |
|---|---|
| Guardrails | Complementary planes. Guardrails screen prompt/response text; the Firewall governs tool actions. Both can apply to one request. Autonomy levels set both at once. |
| Routing | Independent. Routing picks the model/channel; the firewall judges the tool calls regardless of which model served them. |
| API keys | A key attaches to a policy via firewall_policy_id; the binding lives on the key in the gateway. No attachment falls back to the workspace default. |
| MCP gateway | The firewall is the MCP gateway — every server you register dispatches its tools/call through the engine. |
| Skills | A governed skill’s enforcement mode rides on top of the rule verdict, so a quarantined skill is held even if no rule names its tools. |
12. Connecting an agent to the Firewall gateway
There are two ways a tool call reaches the engine:- MCP gateway — point your MCP client (Claude Desktop, Cursor, an
agent framework) at
https://api.orcarouter.ai/api/v1/firewall/mcp. The gateway exposes every reachable registered server’s tools, namespaced<server>.<tool>, and evaluates eachtools/callinline. See MCP servers. - Evaluate hook — call
POST /api/v1/firewall/evaluatefrom your own agent loop before dispatching a tool call, and act on the verdict.
403 on these routes.
13. API reference
All console routes are workspace-scoped via the workspace context and enforce RBAC consistently: reads and the test/simulate sandboxes are open to every member; writes require Developer+.Policies & settings
| Method & path | Role | Purpose |
|---|---|---|
GET /api/workspace/firewall/settings | Member | Read workspace firewall settings (observe mode, defaults). |
PUT /api/workspace/firewall/settings | Developer+ | Update settings. |
GET /api/workspace/firewall/policies | Member | List policies (with rule + attached-key counts). |
GET /api/workspace/firewall/policies/:id | Member | Single policy detail. |
POST /api/workspace/firewall/policies | Developer+ | Create a policy. |
PUT /api/workspace/firewall/policies | Developer+ | Update a policy. |
DELETE /api/workspace/firewall/policies/:id | Developer+ | Delete a policy (409 if keys are still attached). |
Posture, presets & sandboxes
| Method & path | Role | Purpose |
|---|---|---|
GET /api/workspace/firewall/presets | Member | Built-in rule presets. |
POST /api/workspace/firewall/autonomy | Developer+ | Apply an autonomy level. |
POST /api/workspace/firewall/autonomy/undo/:audit_id | Developer+ | Undo an autonomy change. |
GET /api/workspace/firewall/simulate | Member | Preview an autonomy level (?level=). |
POST /api/workspace/firewall/test | Developer+ | Dry-run a policy against a sample tool call. |
Observability
| Method & path | Role | Purpose |
|---|---|---|
GET /api/workspace/firewall/discovered-tools | Member | Tools seen, flagged covered / gap. |
GET /api/workspace/firewall/events | Developer+ | List firewall events (filterable). |
GET /api/workspace/firewall/events/by-request/:request_id | Developer+ | Events for one request. |
GET /api/workspace/firewall/events/aggregate | Developer+ | Runs / sessions rollup. |
GET /api/workspace/firewall/trace/by-run | Developer+ | Trace nodes for a run (?run_id=). |
GET /api/workspace/firewall/anomalies | Member | Anomaly feed (?window=). |
POST /api/workspace/firewall/anomalies/snooze | Developer+ | Snooze the anomaly feed. |
Gateway (machine-to-machine)
These run on a firewall-gateway-scoped token, not the console session:| Method & path | Purpose |
|---|---|
POST /api/v1/firewall/evaluate | Pre-dispatch verdict for one tool call. |
POST /api/v1/firewall/evaluate_plan | Pre-execution check for a multi-step plan. |
ANY /api/v1/firewall/mcp | The unified MCP gateway endpoint. |
GET /api/v1/firewall/approvals/:id | Poll a held call’s approval state. |
POST /api/v1/firewall/approvals/:id/callback | HMAC-signed approval callback. |
14. FAQ
What if no policy resolves on a tool call?
What if no policy resolves on a tool call?
With observe mode off, behavior is byte-identical to a workspace
that never enabled the feature — nothing is blocked or logged. With
observe mode on, the call is allowed but recorded as a coverage
gap so it shows up in Discovered tools.
How do I roll out a policy safely?
How do I roll out a policy safely?
Turn on shadow mode. The policy evaluates and logs exactly as it
would in production, but every enforcing verdict is downgraded to
audit and the reason is prefixed [shadow] would …. Watch the
events and runs views, confirm it fires on what you expect and nothing
you don’t, then turn shadow mode off to start enforcing.Does a blocked tool call cost quota?
Does a blocked tool call cost quota?
An inbound block fires before the upstream model call, so it costs no
model tokens. Audit / allow verdicts don’t change billing. A
cap_cost rule is itself a billing control — it denies once the run’s
spend crosses your cents cap.Firewall or Guardrails — which do I use?
Firewall or Guardrails — which do I use?
Both, for different layers. Guardrails screen the text in prompts
and responses (PII, secrets, jailbreaks). The Firewall governs the
actions an agent takes (which tools, which MCP servers, which hosts).
A request can pass through both. The
tight autonomy level configures
them together.Is enforcement guaranteed for every tool an agent runs?
Is enforcement guaranteed for every tool an agent runs?
The Firewall enforces on tool calls that cross the gateway — the
relay path, the MCP gateway, and the evaluate hook. A tool your agent
executes entirely inside its own process, never touching the gateway,
is outside the firewall’s view. The design goal is to make the gateway
the single audited path for the calls that matter (model-mediated
tools, MCP dispatch, network egress); route those through it and they
are governed.
See also
Going deeper on agent security? The Secure Your Agents (Zero Trust) guides put this feature in a zero-trust workflow.Secure your agents (Zero Trust)
The zero-trust agent firewall playbook — tool allow-lists, argument checks, and egress control.
Secure Agents baseline
One switch that sets your Firewall and Guardrails posture together.
