inbound surface has no destination
to check; an argument clause on inbound has no call-time arguments yet.
This page is the focused guide to the four agent firewall stages: what each
surface observes, when a rule should target it, and the one concrete way
the same intent is expressed at different stages. For the full rule
vocabulary, see Firewall rules; for the policy
model around it, Firewall.
1. The four stages at a glance
Every evaluation is stamped with exactly one stage. A rule with no stage ("") applies to all of them; a rule pinned to one stage only fires
there.
| Stage | What the surface sees |
|---|---|
inbound | Tools the agent advertises on the request |
response | tool_calls the model emits in its reply |
mcp | A tools/call dispatched through the MCP gateway |
egress | An outbound host / IP / CIDR a tool reaches |
stage property when authoring through the
API.
Stage governs what data is in scope, not how strict the verdict is.
A
deny is a deny on any stage; what changes is whether the rule has the
arguments, the tool name, or the destination it needs to match on.2. inbound — the tools an agent advertises
The earliest surface. Before the model ever runs, your agent sends a list
of tool definitions it’s willing to let the model call. The inbound
stage sees that advertised toolset and can block a dangerous tool before
the model can even choose it.
There are no call-time arguments at this stage — the model hasn’t decided
how to call anything yet — so inbound rules match on the tool name (and
optionally its owning skill), not on args_match_json.
A denied call here returns HTTP 400 with code firewall_blocked, named
after the tool and reason, and marked skip-retry.
3. response — the tool calls the model emits
Once the model replies, it may emit one or more tool_calls — concrete
invocations with real arguments. The response stage sees those, so this
is where argument-level rules belong: not “block shell.exec” but “block
shell.exec only when the command is rm -rf.”
sanitize works here —
it redacts matched substrings from the call’s arguments and forwards the
cleaned call. (Sanitize redacts tool-call arguments only; it never
touches the content a tool returns.)
4. mcp — calls dispatched through the gateway
When an agent reaches a tool through OrcaRouter’s
MCP gateway, every tools/call is evaluated on
the mcp stage before it’s dispatched to the registered server. This is
the surface that governs Model Context Protocol traffic — the same
glob / argument / verdict vocabulary as response, applied to MCP
dispatch.
A block here surfaces as a tool error (firewall deny: <reason>)
rather than a transport failure, so the model sees the rejection and can
react — pick another tool, ask the user, or stop.
5. egress — the outbound destination a tool reaches
The last surface. When a tool reports an outbound network destination, the
egress stage matches on it — the SSRF and data-exfiltration surface.
Egress rules don’t match on a tool name pattern alone; they match on a
host / IP / CIDR list:
169.254.169.254) and RFC-1918 ranges are the canonical things to deny.
See Firewall rules §6
for the allow/deny polarity.
No preset ships CIDR rules. The
tight
autonomy level’s
SSRF posture denies fetch-shaped tool names (e.g. http_fetch,
web_search, fetch_url); a destination-based egress deny is something
you author for the hosts and ranges your agents must never reach.6. Choosing the right stage
The same security goal often has a best stage. Match the intent to the surface that actually carries the data you need:Stop a tool from ever being offered → inbound
Stop a tool from ever being offered → inbound
If the model should never even see a tool, deny it on
inbound. The
block lands before the model call, so it costs no model tokens.Allow a tool but constrain its arguments → response (or mcp)
Allow a tool but constrain its arguments → response (or mcp)
Argument clauses need the model’s chosen arguments, which only exist on
response and mcp. Deny on a dangerous argument, or sanitize to
strip a secret or PII value the agent put in an argument.Govern Model Context Protocol traffic → mcp
Govern Model Context Protocol traffic → mcp
Calls routed through the MCP gateway are evaluated on
mcp before
dispatch — the choke point for every registered server’s tools.Block where an agent can connect → egress
Block where an agent can connect → egress
Destination-based rules — block the cloud-metadata IP, deny a CIDR,
allowlist your approved hosts — only make sense on
egress.Apply to every surface → leave the stage empty
Apply to every surface → leave the stage empty
A rule with no stage runs on all four. Use it for a blanket
default_verdict-style rule, or a tool you deny everywhere it appears.7. Stages and shadow mode
A policy’sshadow_mode flag is independent of stage. Turn it on and
every enforcing verdict — on any stage — is downgraded to audit and the
reason is prefixed [shadow] would …, so you can confirm a rule fires on
the right surface before it changes live traffic. See
Shadow mode and
Enforcement modes.
8. Where stages fit the bigger picture
The four stages are the where of enforcement; the rest of the model is the what and who.Verdicts
What each stage can do once it matches — allow, audit, deny, sanitize,
hold for approval, cap cost.
Tool allow-listing
Use
inbound to constrain the toolset an agent advertises.Validate arguments
Author
response / mcp argument clauses that gate a tool by how
it’s called.Egress control
Block outbound destinations on the
egress surface — the
exfiltration boundary.