Test firewall rules safely

You wrote a firewall rule — a deny on shell.exec, an egress allow-list, an argument clause that only fires on rm -rf — and now you want to know it does exactly what you think before it changes a single production tool call. The firewall gives you three non-destructive ways to test firewall rules, each answering a different question:

Dry-run one call

The Test sandbox feeds one synthetic tool call through the real engine and returns the verdict — nothing dispatched, nothing logged. Developer+.

Replay a posture

Simulate replays an autonomy level against your recent traffic and counts how many calls it would block. Member-readable.

Run against live traffic

Shadow mode evaluates a whole policy on real calls but downgrades every enforcing verdict to audit. Zero blast radius.

All three configure through the console (or the /api/workspace/firewall/* management routes, which authenticate with your session / access token — not a relay sk-orca-… key). Your agent’s /v1/* relay calls never change while you test.

1. Test firewall rules with the dry-run Test sandbox

The Test sandbox is the tightest loop: hand it a single synthetic tool call and it runs the real evaluation engine — full policy resolution, rules walked in priority order, first-match-wins — then returns the verdict, the rule that produced it, and the human-readable reason. The call is a dry run: nothing is dispatched to any tool, and nothing is written to the events feed or the Discovered-tools inventory. It answers one question precisely: given this exact tool name and these arguments, what does my policy decide — and which rule decides it?

The Test sandbox is Developer+. It can preview against an unsaved draft policy by id and the response surfaces the matched policy name and rule label, so it sits closer to a write-surface preview than a plain read — unlike Simulate and the other read views, which are open to every member.

One concrete dry run

Say you’ve added a rule that should deny shell.exec only when the command contains rm -rf. You want to confirm two things in one sitting: the dangerous command is denied, and an innocent one still passes.

Test the dangerous call

In Security → Firewall, open the Test tab, pick the response surface, enter tool name shell.exec and arguments {"command": "rm -rf /data"}, and run. The response names the verdict and the matched rule:

{
  "verdict": "deny",
  "policy_name": "prod-agents",
  "rule_label": "block destructive shell",
  "reason": "destructive shell command",
  "gap": false,
  "shadow_mode": false
}

Test the innocent call

Run it again with {"command": "ls -la"}. The argument clause no longer matches, so the rule falls through to the policy default — you should see allow or audit and an empty rule_label. If rm -rf denies and ls -la doesn’t, your argument clause is scoped correctly.

Preview a draft before you attach it

Pass a policy_id to evaluate against a specific draft policy instead of the one your traffic currently resolves — so you can prove a new policy is right before you attach a key to it or promote it to workspace default.

Read gap in the response. gap: true means a policy resolved but no rule inside it matched the call (and the workspace is in observe mode) — the tool slipped through every rule and fell to the default. That’s a coverage hole to close before you ship, not a verdict to trust.

The Test sandbox uses the same surfaces as live evaluation — inbound, response, mcp, egress (default inbound) — so test each rule on the surface it’s pinned to. On inbound there are no call-time arguments, so a sanitize rule escalates to a block there exactly as it would in production; see stages for why surface matters.

2. Simulate an autonomy level before you apply it

The Test sandbox checks one call. Simulate answers the posture-level question: if I switched this whole workspace to a stricter autonomy level, how much of my recent traffic would it block? Simulate replays a candidate level’s deny rules against your trailing firewall events and returns the would-be impact — tool names and counts only, never arguments. It is read-only and Member-readable, so anyone on the team can preview the blast radius of tight before a Developer commits to it.

What the three levels would do

tight — default-deny, deny destructive shell, deny fetch-shaped tools (the SSRF vector), PII Shield + Secrets Blocker enforced. Simulate shows how much of your real traffic this floor would catch.
balanced — default audit, deny destructive shell, PII Shield in audit-only (flags PII). The recommended starting posture.
permissive — observe only; nothing enforced.

Simulate vs. apply

Simulate changes nothing — it’s a what-if over past events. Applying an autonomy level (a Developer+ write) materializes real, editable autonomy_* policy and guardrail rows, with one-click undo from the audit snapshot. Preview with Simulate, then apply when the count looks right.

3. Shadow mode: test against live traffic with no blast radius

The Test sandbox and Simulate are offline previews. Shadow mode is the live one: a per-policy flag that evaluates the policy on real agent traffic, walks every rule, picks a verdict — then downgrades every enforcing verdict (deny, sanitize, pending_approval) to audit and prefixes the reason [shadow] would …. The call always goes through; nothing is blocked, redacted, or held. That makes the events feed read like a production run with enforcement turned off. Filter for [shadow] and you have a complete list of every call the policy is about to start blocking — before it blocks one.

Test method	Runs against	Question it answers
Test sandbox	One synthetic call	”What verdict does this exact call get, and which rule decides?”
Simulate	Recent events	”How many calls would a stricter autonomy level block?”
Shadow mode	Live traffic	”What would this policy block across real production traffic?”

Shadow mode is the deeper of the three — full live coverage with zero blast radius. It has its own page: Roll out a firewall policy with shadow mode walks the toggle, the [shadow] would … reasons, and the flip to enforce.

4. A practical testing order

The three tools compose into one safe-rollout path — cheapest check first, widest coverage last:

Dry-run the rules you just wrote

Use Test to confirm each new rule fires on the calls it should and passes the ones it shouldn’t — including the negative cases. Fast, Developer+, nothing persisted.

Gauge the posture (optional)

If you’re reaching for an autonomy level rather than hand-written rules, Simulate the level and read the would-be-blocked count against real traffic before applying it.

Shadow against live traffic

Turn on shadow mode and let a representative window of real calls flow. Read the [shadow] would … events; tighten any rule that surfaces a false positive — still in shadow, zero blast radius.

Enforce

When the feed fires on what you expect and nothing you don’t, flip shadow off. The next call enforces for real.

Testing previews the policy, not governed skills. A skill in block or quarantine mode still enforces even under a shadowed policy — the skill’s review disposition wins. Shadowing a policy was never a request to un-quarantine a skill.

5. API reference

These management routes use your session / access token and are workspace-scoped:

Method & path	Role	Purpose
`POST /api/workspace/firewall/test`	Developer+	Dry-run one synthetic tool call against the resolved (or a draft `policy_id`) policy. Returns verdict, `policy_name`, `rule_label`, `reason`, `gap`, `shadow_mode`. Nothing dispatched or logged.
`GET /api/workspace/firewall/simulate?level=`	Member	Replay an autonomy level against recent events; returns would-be-blocked counts.
`GET /api/workspace/firewall/policies/:id`	Member	Read a policy’s current `shadow_mode` flag.
`PUT /api/workspace/firewall/policies`	Developer+	Toggle `shadow_mode` on the policy.

The Test body takes surface (default inbound), a required tool_name, optional args_json, and an optional policy_id to override resolution.

Where to go next

Shadow mode

The live-traffic rollout: [shadow] would …, the events filter, and the flip to enforce.

Validate arguments

Scope a rule to which arguments — the clauses the Test sandbox lets you verify against rm -rf vs ls -la.

Verdicts

What allow / audit / deny / sanitize / pending_approval / cap_cost each do when a test stops being a test.

Events log

Where shadowed verdicts land — filter, drill into runs and matched rules.

For the rule-matching grammar these tests exercise, see the full firewall rules reference; for where testing fits the broader model, see enforcement modes.

Dry-run one call

Replay a posture

Run against live traffic

​1. Test firewall rules with the dry-run Test sandbox

​One concrete dry run

​2. Simulate an autonomy level before you apply it

​3. Shadow mode: test against live traffic with no blast radius

​4. A practical testing order

​5. API reference

​Where to go next

Shadow mode

Validate arguments

Verdicts

Events log

1. Test firewall rules with the dry-run Test sandbox

One concrete dry run

2. Simulate an autonomy level before you apply it

3. Shadow mode: test against live traffic with no blast radius

4. A practical testing order

5. API reference

Where to go next