Skip to main content
You have a policy you want to put in front of production. The fear isn’t the policy — it’s flipping it on and discovering it blocks a tool your agent actually needs, or masks a field your app depends on. The fix isn’t more testing in a sandbox; it’s rolling out against real traffic in a posture that can’t break anything, then tightening only once you’ve seen what fires. This recipe is that rollout: observe → shadow → enforce, with autonomy balanced before tight. You stay in the console (or the REST API) the whole way; the agent keeps calling https://api.orcarouter.ai/v1/... exactly as before.
New to the model? Read Enforcement modes for what each posture does mechanically, and the Secure Agents baseline for what each autonomy level sets. This page is the sequence — the order you flip the switches.

1. The ai security rollout in three moves

The whole rollout trades autonomy for safety in three steps, each verified against live traffic before the next:

Observe

Allow everything, log everything. Uncovered tool calls land in Discovered Tools; guardrail flag rules record matches without changing traffic. You learn your agent’s real shape.

Shadow

A real policy evaluates every call, but every enforcing verdict is downgraded to audit and logged [shadow] would …. You see exactly what would block — with nothing actually blocked.

Enforce

Shadow off. deny blocks, mask redacts, pending_approval holds. The autonomy goes from wide (balanced) to tight (tight), and your agent is governed.
The discipline is the point: you never enforce a rule you haven’t first watched fire in shadow against your own traffic.

2. Move one — observe (autonomy = permissive)

Start as wide as it goes. Apply the permissive autonomy level from Firewall → Posture (Developer+) — or POST /api/workspace/firewall/autonomy. It enables workspace observe mode and leaves no enforcing policy, so every call is allowed and every uncovered call is logged as a coverage gap.
# Console: Firewall → Posture → apply "permissive"
# or, via the REST API on your session token:
curl https://api.orcarouter.ai/api/workspace/firewall/autonomy \
  -H "Authorization: Bearer <your console access token>" \
  -H "X-Workspace-Id: <workspace>" \
  -H "Content-Type: application/json" \
  -d '{"level": "permissive"}'
The /api/workspace/firewall/* routes run on your console session / access token, not on a relay sk-orca-... key. Applying an autonomy level is a workspace write — Developer+. Only /v1/* model calls use the relay key.
Now point real traffic at it and let it run. Watch two feeds:
  • Firewall → Discovered Tools (Member) — every tool your agent calls, flagged covered or gap. This is the input for your rules: you’re about to write policy for traffic that actually happens, not hypotheticals.
  • Guardrails → Matches (Member) — if you’ve added any flag-action rules, every match they record, without touching the request.
Let it run until Discovered Tools stops surfacing new tools. That stable list is your rule-authoring spec.

3. Move two — shadow (a real policy, zero blocking)

Now author the policy you actually want — tool globs, argument clauses, egress lists, a cap_cost ceiling — and turn on shadow_mode before you attach it. (Build rules from firewall rules; the full policy model is in the Firewall reference.)
{
  "name": "prod-rollout",
  "enabled": true,
  "shadow_mode": true,
  "default_verdict": "audit",
  "rules": [
    { "priority": 10, "tool_name_glob": "shell.exec", "verdict": "deny" },
    { "priority": 20, "tool_name_glob": "*",          "verdict": "cap_cost", "cap_cost_cents": 1000 }
  ]
}
With shadow_mode: true, that deny and that cap_cost are both downgraded to audit at evaluation time — the engine computes the real verdict, logs it prefixed [shadow] would …, and lets the call through. Attach the policy to the keys you’re rolling out (set firewall_policy_id on the key) or make it the workspace default. Then read Firewall → Events / Runs (Developer+) filtered to the [shadow] prefix and confirm both sides:
Every shell.exec call shows [shadow] would deny. Every run that crosses your cap shows [shadow] would cap_cost. The policy sees the traffic you built it for.
No legitimate tool shows up with a would-block verdict. This is the false-positive check — the reason shadow exists. If a tool you need is flagged, fix the rule and re-watch before you ever enforce.
Guardrails have no policy-level shadow. The equivalent is the per-rule flag action: it records a match in the Matches feed and changes nothing, so you can measure a content rule before switching it to block or mask. Run your guardrail rules as flag through this same move.

4. Move three — enforce (autonomy balanced, then tight)

When the shadow log looks right, enforce in two stages, not one. Don’t jump straight to default-deny. First, balanced. This is the recommended first enforcing posture: the firewall default verdict is audit, but the most destructive actions (like destructive shell) are denied, and the PII Shield guardrail runs audit-only — it flags PII without masking it yet. You’re now blocking the worst thing while still observing the rest. Turn shadow_mode off on your own policy in the same move so its deny / cap_cost verdicts go live alongside the baseline.
curl https://api.orcarouter.ai/api/workspace/firewall/autonomy \
  -H "Authorization: Bearer <your console access token>" \
  -H "X-Workspace-Id: <workspace>" \
  -H "Content-Type: application/json" \
  -d '{"level": "balanced"}'
Watch Events for an hour. Real blocks now appear without the [shadow] prefix. A denied tool call returns HTTP 400 firewall_blocked; it’s skip-retry and costs no model tokens. Then, tight. Once balanced is quiet, go to default-deny. The tight level denies by default, denies destructive shell and SSRF egress, and enforces PII Shield + Secrets Blocker — PII is masked on the request before the model sees it, and secrets in your requests are blocked. A blocked prompt returns HTTP 400 guardrail_blocked, costs no quota, and is skip-retry.
StageFirewall (actions)Guardrails (text)What you’re proving
permissiveObserve; nothing blockedflag onlyReal traffic shape
balancedDefault audit; destructive shell deniedPII flaggedWorst case is stopped
tightDefault-deny; shell + fetch-shaped tools (SSRF) deniedPII masked, secrets blockedFull zero-trust
Streaming caveat for PII. Under tight, PII Shield masks PII on the request before the model sees it — that’s live. Output-side masking of a streaming response is not yet live; an output block is enforced on streaming (the scanner cuts the stream). If you depend on redacting model output, verify your stage/stream combination in the guardrail Test tab first. See Guardrails.

5. The escape hatch — one-click undo

Every autonomy change is a single transaction that snapshots your prior posture, so you can roll straight back from Firewall → Posture (or POST /api/workspace/firewall/autonomy/undo/:audit_id). You can also just re-apply a softer level — drop tight back to balanced, or balanced back to permissive — at any time.
Undo restores from the audit snapshot of the most recent apply. If you’ve made manual policy edits since the apply you’re undoing, that snapshot is no longer the latest unused one and undo declines rather than silently rolling those edits away. When that happens, re-apply a softer level instead — it’s always available.

6. Where each move’s verdicts come from

The rollout never blocks something you didn’t ask for, because each posture maps to an explicit, observable mechanism:
PostureMechanismOutcome
ObserveWorkspace firewall_observe_mode on + guardrail flagAllow + log gaps / matches
ShadowPer-policy shadow_modeReal verdict computed, downgraded to audit, logged [shadow] would …
Enforceshadow_mode off + tight/balanced autonomydeny / mask / cap_cost go live
The four terms — observe mode, the audit verdict, the flag action, and shadow_mode — are distinct switches, documented side by side in Enforcement modes.

7. Next steps

Enforcement modes

The mechanism map behind observe, shadow, and enforce.

Secure Agents baseline

What each autonomy level sets, and how to simulate it first.

Tame an autonomous agent

The next step once you’ve enforced: cost caps, anomaly detection, and approvals.

Agent Firewall

Policies, rules, verdicts, shadow mode, and the MCP gateway in full.
A go-live you can trust is a rollout, not a switch. Observe what your agent does, shadow the policy against its real traffic, then enforce — balanced before tight — and every rule is verified on production before it ever blocks it.