Lock down a coding agent and its tools

A coding agent is the highest-leverage thing in your workspace and the most dangerous. It runs shell.exec, edits files, fetches URLs, and loads community skills — any one of which can rm -rf a volume, read a .env, or exfiltrate to an attacker host. This recipe locks that surface down with the Firewall: deny destructive shell, validate the arguments of the calls you do allow, fence egress, and hold the genuinely risky operations for a human. None of it touches your agent’s code — the policy lives in the gateway and is enforced on the next call.

Everything below is configured in the console (Firewall → Posture / Policies). Those management routes use your console session, not a relay key. Only the /v1/* calls your agent makes carry an sk-orca-… key. Policy edits require the Developer role.

1. Start by watching, not blocking — secure coding agent baseline

Don’t author rules blind. Give the agent its sk-orca-… key, then open Firewall → Posture and apply the balanced autonomy level. In one transaction this audits every tool call, flags PII, and denies destructive shell — so the worst action is already fenced while you learn the rest of the agent’s behavior from real traffic. Let it run, then read Firewall → Discovered tools: every tool the workspace has seen, flagged covered (a rule applies) or gap (nothing does). That list is your allow-list draft. When the feed looks right, move to tight (default-deny) or author the targeted policy below.

balanced is the recommended starting posture; permissive blocks nothing but still logs everything; tight is default-deny plus the secrets and SSRF presets. See the baseline for exactly what each one materializes.

2. Deny destructive shell — the non-negotiable floor

The single most important rule for a coding agent is no destructive shell. The balanced and tight autonomy levels already ship this as the Block destructive shell preset, which materializes real, editable deny rules covering both the workspace-direct tool names (shell.*, bash, cmd.*, powershell.*, exec.*) and the MCP-namespaced forms a registered server exposes (*.shell.*, *.cmd.*, …). If you’d rather scope it tighter than “deny all shell”, author one rule that only denies the destructive commands and audits the rest. A rule matches on a tool-name glob plus an optional argument predicate (JSONPath against the call’s arguments):

Deny rm -rf but allow other shell calls

In Firewall → Policies, add a rule above your default:

Tool glob: shell.exec
Args match (JSONPath clause):

{
  "clauses": [
    { "path": "$.command", "op": "regex", "value": "(?i)\\brm\\s+-[a-z]*[rf]" }
  ]
}

Verdict: deny

The argument operators are a closed set — eq, contains, regex, in, cidr_match, gt, lt. A call whose $.command matches the regex is blocked; everything else falls through to the next rule.

What the block looks like

A denied call on the inbound surface returns HTTP 400 with error code firewall_blocked and a message naming the tool and reason. A call dispatched through the MCP gateway comes back as a tool error (firewall deny: …) so the model can react instead of crashing. Inbound blocks fire before the upstream model call, so they cost no model tokens.

See Firewall rules for the full matching language (tool globs, argument clauses, sequences, cost caps).

3. Validate arguments on the tools you keep

Allowing a tool isn’t the same as allowing every argument to it. The same JSONPath predicate that scopes a deny lets you constrain the shape of an allowed call — so db.query can’t carry a DROP, and file.write can’t escape a directory.

Block a SQL DROP

Glob db.query, clause {"path":"$.sql","op":"regex","value":"(?i)\\bdrop\\b"}, verdict deny.

Redact a secret in args

Verdict sanitize redacts matched substrings from the tool-call arguments before the call is forwarded. It never touches what a tool returns; on the inbound surface (no call-time args yet) it escalates to a block.

The Firewall sanitizes tool-call arguments, not tool results. To stop a secret from ever entering a request in the first place, attach the Secrets Blocker guardrail to the key — that screens the prompt text itself before the model sees it. The two planes compose: guardrails screen text, the Firewall governs the action.

4. Control egress — fence where the agent can reach

A coding agent that can fetch URLs can be steered into SSRF (hitting cloud-metadata or an internal 10.x host) or used to exfiltrate. The tight autonomy level ships an SSRF preset that denies fetch-shaped tool names (http_fetch, web_search, fetch_url, request, and their <server>.* forms) outright. For destination-level control, author an egress rule. Egress rules scope by host or CIDR with allow / deny entries, evaluated on the egress surface:

{ "deny": ["169.254.169.254", "10.0.0.0/8", "*.internal"] }

That fires on any outbound destination reported by a tool that lands on a private range, the cloud-metadata IP, or an internal hostname — letting public destinations through while fencing the dangerous ones.

No preset ships CIDR-based egress rules — the SSRF preset matches fetch-shaped tool names. The host/CIDR denylist above is one you author yourself. See Stop exfiltration for the full pattern.

5. Hold risky operations for a human (HITL)

Some operations shouldn’t be auto-allowed or auto-denied — a deploy, a git push, a destructive migration. For those, use the pending_approval verdict. The call is held, the agent gets a “held” response with an approval id, and a reviewer resolves it out-of-band:

Author a rule (e.g. glob deploy.*, verdict pending_approval).
The held call returns HTTP 400 firewall_approval_pending with an approval id.
A reviewer approves it from the console (Developer+) or via an HMAC-signed webhook callback.
The agent polls the approval, then re-submits the original call with a single-use X-OrcaRouter-Firewall-Approval header — and the gateway lets it through that once.

Roll any new policy out in shadow mode first. The policy evaluates and logs exactly as it would in production, but every enforcing verdict is downgraded to audit with a [shadow] would … reason — so you can prove it fires on what you expect before it can break a build.

6. Govern the skills and MCP servers it loads

Coding agents pull in capabilities at runtime — community skills, BYO MCP servers. The Firewall governs both at the gateway:

Skills are scanned into a risk band with an enforcement mode (allow / quarantine / block). An auto-detected skill is quarantined — held for approval — until a reviewer clears it. See Skills.
MCP servers you register dispatch every tools/call through the gateway, which evaluates each one on the mcp surface before dispatch. Credentials are stored encrypted; a health probe reports ok / degraded / down. See MCP servers and Harden an MCP agent.

7. Verify and observe

Before you depend on a policy, dry-run it. The Test tab evaluates a sample tool call against the current policy and shows the verdict, the matched rule, and the reason — nothing is dispatched, nothing persisted. Once live, Firewall → Events / Runs is the record of every evaluation, filterable by verdict, surface, tool, and run, and the anomaly feed flags rate/cost spikes against the workspace’s learned baseline, retry_loop, and never-before-seen tool paths.

Recap

Firewall reference

The full policy plane — surfaces, verdicts, resolution, autonomy.

Firewall rules

The matching language: globs, argument clauses, egress, sequences.

Dangerous tool calls

The threat this recipe defends against.

Excessive agency

Why over-permissioned agents are the core agent risk.

Autonomous agent recipe

Lock down a fully autonomous agent loop end to end.

Stop exfiltration

The egress and lethal-trifecta patterns in depth.

​1. Start by watching, not blocking — secure coding agent baseline

​2. Deny destructive shell — the non-negotiable floor

​3. Validate arguments on the tools you keep

Block a SQL DROP

Redact a secret in args

​4. Control egress — fence where the agent can reach

​5. Hold risky operations for a human (HITL)

​6. Govern the skills and MCP servers it loads

​7. Verify and observe

​Recap

Firewall reference

Firewall rules

Dangerous tool calls

Excessive agency

Autonomous agent recipe

Stop exfiltration

1. Start by watching, not blocking — secure coding agent baseline

2. Deny destructive shell — the non-negotiable floor

3. Validate arguments on the tools you keep

4. Control egress — fence where the agent can reach

5. Hold risky operations for a human (HITL)

6. Govern the skills and MCP servers it loads

7. Verify and observe

Recap