shell.exec, edits files, fetches URLs, and
loads community skills — any one of which can rm -rf a volume, read a
.env, or exfiltrate to an attacker host. This recipe locks that surface
down with the Firewall: deny destructive shell,
validate the arguments of the calls you do allow, fence egress, and
hold the genuinely risky operations for a human. None of it touches your
agent’s code — the policy lives in the gateway and is enforced on the
next call.
Everything below is configured in the console (Firewall → Posture /
Policies). Those management routes use your console session, not a relay
key. Only the
/v1/* calls your agent makes carry an sk-orca-… key.
Policy edits require the Developer role.1. Start by watching, not blocking — secure coding agent baseline
Don’t author rules blind. Give the agent itssk-orca-… key, then open
Firewall → Posture and apply the balanced
autonomy level. In one
transaction this audits every tool call, flags PII, and denies
destructive shell — so the worst action is already fenced while you
learn the rest of the agent’s behavior from real traffic.
Let it run, then read Firewall → Discovered tools: every tool the
workspace has seen, flagged covered (a rule applies) or gap
(nothing does). That list is your allow-list draft. When the feed looks
right, move to tight (default-deny) or author the targeted policy below.
2. Deny destructive shell — the non-negotiable floor
The single most important rule for a coding agent is no destructive shell. Thebalanced and tight autonomy levels already ship this as
the Block destructive shell preset, which materializes real,
editable deny rules covering both the workspace-direct tool names
(shell.*, bash, cmd.*, powershell.*, exec.*) and the
MCP-namespaced forms a registered server exposes (*.shell.*,
*.cmd.*, …).
If you’d rather scope it tighter than “deny all shell”, author one rule
that only denies the destructive commands and audits the rest. A rule
matches on a tool-name glob plus an optional argument predicate
(JSONPath against the call’s arguments):
Deny rm -rf but allow other shell calls
Deny rm -rf but allow other shell calls
In Firewall → Policies, add a rule above your default:
- Tool glob:
shell.exec - Args match (JSONPath clause):
- Verdict:
deny
eq, contains, regex,
in, cidr_match, gt, lt. A call whose $.command matches the
regex is blocked; everything else falls through to the next rule.What the block looks like
What the block looks like
A denied call on the inbound surface returns HTTP 400 with
error code
firewall_blocked and a message naming the tool and
reason. A call dispatched through the MCP gateway comes back as a
tool error (firewall deny: …) so the model can react instead of
crashing. Inbound blocks fire before the upstream model call, so they
cost no model tokens.3. Validate arguments on the tools you keep
Allowing a tool isn’t the same as allowing every argument to it. The same JSONPath predicate that scopes a deny lets you constrain the shape of an allowed call — sodb.query can’t carry a DROP, and file.write can’t
escape a directory.
Block a SQL DROP
Glob
db.query, clause
{"path":"$.sql","op":"regex","value":"(?i)\\bdrop\\b"}, verdict
deny.Redact a secret in args
Verdict
sanitize redacts matched substrings from the tool-call
arguments before the call is forwarded. It never touches what a
tool returns; on the inbound surface (no call-time args yet) it
escalates to a block.4. Control egress — fence where the agent can reach
A coding agent that can fetch URLs can be steered into SSRF (hitting cloud-metadata or an internal10.x host) or used to exfiltrate. The
tight autonomy level ships an SSRF preset that denies fetch-shaped
tool names (http_fetch, web_search, fetch_url, request, and
their <server>.* forms) outright.
For destination-level control, author an egress rule. Egress rules
scope by host or CIDR with allow / deny entries, evaluated on the
egress surface:
No preset ships CIDR-based egress rules — the SSRF preset matches
fetch-shaped tool names. The host/CIDR denylist above is one you author
yourself. See Stop exfiltration
for the full pattern.
5. Hold risky operations for a human (HITL)
Some operations shouldn’t be auto-allowed or auto-denied — a deploy, agit push, a destructive migration. For those, use the
pending_approval verdict. The call is held, the agent gets a “held”
response with an approval id, and a reviewer resolves it out-of-band:
- Author a rule (e.g. glob
deploy.*, verdictpending_approval). - The held call returns HTTP 400
firewall_approval_pendingwith an approval id. - A reviewer approves it from the console (Developer+) or via an HMAC-signed webhook callback.
- The agent polls the approval, then re-submits the original call with a
single-use
X-OrcaRouter-Firewall-Approvalheader — and the gateway lets it through that once.
6. Govern the skills and MCP servers it loads
Coding agents pull in capabilities at runtime — community skills, BYO MCP servers. The Firewall governs both at the gateway:- Skills are scanned into a risk band with an enforcement mode
(
allow/quarantine/block). An auto-detected skill is quarantined — held for approval — until a reviewer clears it. See Skills. - MCP servers you register dispatch every
tools/callthrough the gateway, which evaluates each one on themcpsurface before dispatch. Credentials are stored encrypted; a health probe reportsok/degraded/down. See MCP servers and Harden an MCP agent.
7. Verify and observe
Before you depend on a policy, dry-run it. The Test tab evaluates a sample tool call against the current policy and shows the verdict, the matched rule, and the reason — nothing is dispatched, nothing persisted. Once live, Firewall → Events / Runs is the record of every evaluation, filterable by verdict, surface, tool, and run, and the anomaly feed flags rate/cost spikes against the workspace’s learned baseline,retry_loop, and never-before-seen tool paths.
Recap
Firewall reference
The full policy plane — surfaces, verdicts, resolution, autonomy.
Firewall rules
The matching language: globs, argument clauses, egress, sequences.
Dangerous tool calls
The threat this recipe defends against.
Excessive agency
Why over-permissioned agents are the core agent risk.
Autonomous agent recipe
Lock down a fully autonomous agent loop end to end.
Stop exfiltration
The egress and lethal-trifecta patterns in depth.
