deny on shell.exec, an egress
allow-list, an argument clause that only fires on rm -rf — and now you
want to know it does exactly what you think before it changes a single
production tool call. The firewall gives you three non-destructive ways to
test firewall rules, each answering a different question:
Dry-run one call
The Test sandbox feeds one synthetic tool call through the real
engine and returns the verdict — nothing dispatched, nothing logged.
Developer+.
Replay a posture
Simulate replays an autonomy level against your recent traffic and
counts how many calls it would block. Member-readable.
Run against live traffic
Shadow mode evaluates a whole policy on real calls but downgrades
every enforcing verdict to
audit. Zero blast radius.All three configure through the console (or the
/api/workspace/firewall/* management routes, which authenticate with your
session / access token — not a relay sk-orca-… key). Your agent’s
/v1/* relay calls never change while you test.1. Test firewall rules with the dry-run Test sandbox
The Test sandbox is the tightest loop: hand it a single synthetic tool call and it runs the real evaluation engine — full policy resolution, rules walked in priority order, first-match-wins — then returns the verdict, the rule that produced it, and the human-readable reason. The call is a dry run: nothing is dispatched to any tool, and nothing is written to the events feed or the Discovered-tools inventory. It answers one question precisely: given this exact tool name and these arguments, what does my policy decide — and which rule decides it?One concrete dry run
Say you’ve added a rule that should denyshell.exec only when the
command contains rm -rf. You want to confirm two things in one sitting:
the dangerous command is denied, and an innocent one still passes.
Test the dangerous call
In Security → Firewall, open the Test tab, pick the
response surface, enter tool name shell.exec and arguments
{"command": "rm -rf /data"}, and run. The response names the verdict
and the matched rule:Test the innocent call
Run it again with
{"command": "ls -la"}. The argument clause no
longer matches, so the rule falls through to the policy default — you
should see allow or audit and an empty rule_label. If rm -rf
denies and ls -la doesn’t, your argument
clause is scoped correctly.inbound,
response, mcp, egress (default inbound) — so test each rule on the
surface it’s pinned to. On inbound there are no call-time arguments, so a
sanitize rule escalates to a block there exactly as it would in
production; see stages for why surface
matters.
2. Simulate an autonomy level before you apply it
The Test sandbox checks one call. Simulate answers the posture-level question: if I switched this whole workspace to a stricter autonomy level, how much of my recent traffic would it block? Simulate replays a candidate level’s deny rules against your trailing firewall events and returns the would-be impact — tool names and counts only, never arguments. It is read-only and Member-readable, so anyone on the team can preview the blast radius oftight before a Developer
commits to it.
What the three levels would do
What the three levels would do
tight— default-deny, deny destructive shell, deny fetch-shaped tools (the SSRF vector), PII Shield + Secrets Blocker enforced. Simulate shows how much of your real traffic this floor would catch.balanced— defaultaudit, deny destructive shell, PII Shield in audit-only (flags PII). The recommended starting posture.permissive— observe only; nothing enforced.
Simulate vs. apply
Simulate vs. apply
Simulate changes nothing — it’s a what-if over past events.
Applying an autonomy level (a Developer+ write) materializes real,
editable
autonomy_* policy and guardrail rows, with one-click undo
from the audit snapshot. Preview with Simulate, then apply when the
count looks right.3. Shadow mode: test against live traffic with no blast radius
The Test sandbox and Simulate are offline previews. Shadow mode is the live one: a per-policy flag that evaluates the policy on real agent traffic, walks every rule, picks a verdict — then downgrades every enforcing verdict (deny, sanitize, pending_approval) to audit and
prefixes the reason [shadow] would …. The call always goes through;
nothing is blocked, redacted, or held.
That makes the events feed read like a
production run with enforcement turned off. Filter for [shadow] and you
have a complete list of every call the policy is about to start blocking —
before it blocks one.
| Test method | Runs against | Question it answers |
|---|---|---|
| Test sandbox | One synthetic call | ”What verdict does this exact call get, and which rule decides?” |
| Simulate | Recent events | ”How many calls would a stricter autonomy level block?” |
| Shadow mode | Live traffic | ”What would this policy block across real production traffic?” |
Shadow mode is the deeper of the three — full live coverage with zero
blast radius. It has its own page:
Roll out a firewall policy with shadow mode
walks the toggle, the
[shadow] would … reasons, and the flip to enforce.4. A practical testing order
The three tools compose into one safe-rollout path — cheapest check first, widest coverage last:Dry-run the rules you just wrote
Use Test to confirm each new rule fires on the calls it should and
passes the ones it shouldn’t — including the negative cases. Fast,
Developer+, nothing persisted.
Gauge the posture (optional)
If you’re reaching for an autonomy level rather than hand-written
rules, Simulate the level and read the would-be-blocked count
against real traffic before applying it.
Shadow against live traffic
Turn on shadow mode and let a representative window of real calls
flow. Read the
[shadow] would … events; tighten any rule that
surfaces a false positive — still in shadow, zero blast radius.5. API reference
These management routes use your session / access token and are workspace-scoped:| Method & path | Role | Purpose |
|---|---|---|
POST /api/workspace/firewall/test | Developer+ | Dry-run one synthetic tool call against the resolved (or a draft policy_id) policy. Returns verdict, policy_name, rule_label, reason, gap, shadow_mode. Nothing dispatched or logged. |
GET /api/workspace/firewall/simulate?level= | Member | Replay an autonomy level against recent events; returns would-be-blocked counts. |
GET /api/workspace/firewall/policies/:id | Member | Read a policy’s current shadow_mode flag. |
PUT /api/workspace/firewall/policies | Developer+ | Toggle shadow_mode on the policy. |
surface (default inbound), a required tool_name,
optional args_json, and an optional policy_id to override resolution.
Where to go next
Shadow mode
The live-traffic rollout:
[shadow] would …, the events filter, and
the flip to enforce.Validate arguments
Scope a rule to which arguments — the clauses the Test sandbox lets
you verify against
rm -rf vs ls -la.Verdicts
What
allow / audit / deny / sanitize / pending_approval /
cap_cost each do when a test stops being a test.Events log
Where shadowed verdicts land — filter, drill into runs and matched
rules.
