https://api.orcarouter.ai/v1/... exactly as before.
New to the model? Read Enforcement modes
for what each posture does mechanically, and the
Secure Agents baseline for
what each autonomy level sets. This page is the sequence — the order you
flip the switches.
1. The ai security rollout in three moves
The whole rollout trades autonomy for safety in three steps, each verified against live traffic before the next:Observe
Allow everything, log everything. Uncovered tool calls land in
Discovered Tools; guardrail
flag rules record matches without
changing traffic. You learn your agent’s real shape.Shadow
A real policy evaluates every call, but every enforcing verdict is
downgraded to
audit and logged [shadow] would …. You see exactly
what would block — with nothing actually blocked.Enforce
Shadow off.
deny blocks, mask redacts, pending_approval holds.
The autonomy goes from wide (balanced) to tight (tight), and your
agent is governed.2. Move one — observe (autonomy = permissive)
Start as wide as it goes. Apply thepermissive autonomy level from
Firewall → Posture (Developer+) — or
POST /api/workspace/firewall/autonomy. It enables workspace observe
mode and leaves no enforcing policy, so every call is allowed and every
uncovered call is logged as a coverage gap.
- Firewall → Discovered Tools (Member) — every tool your agent calls, flagged covered or gap. This is the input for your rules: you’re about to write policy for traffic that actually happens, not hypotheticals.
- Guardrails → Matches (Member) — if you’ve added any
flag-action rules, every match they record, without touching the request.
3. Move two — shadow (a real policy, zero blocking)
Now author the policy you actually want — tool globs, argument clauses, egress lists, acap_cost ceiling — and turn on shadow_mode before
you attach it. (Build rules from firewall rules;
the full policy model is in the Firewall reference.)
shadow_mode: true, that deny and that cap_cost are both
downgraded to audit at evaluation time — the engine computes the
real verdict, logs it prefixed [shadow] would …, and lets the call
through. Attach the policy to the keys you’re rolling out (set
firewall_policy_id on the key) or make it the workspace default.
Then read Firewall → Events / Runs (Developer+) filtered to the
[shadow] prefix and confirm both sides:
It fires where you meant it to
It fires where you meant it to
Every
shell.exec call shows [shadow] would deny. Every run that
crosses your cap shows [shadow] would cap_cost. The policy sees the
traffic you built it for.It does NOT fire where you didn't
It does NOT fire where you didn't
No legitimate tool shows up with a would-block verdict. This is the
false-positive check — the reason shadow exists. If a tool you need is
flagged, fix the rule and re-watch before you ever enforce.
4. Move three — enforce (autonomy balanced, then tight)
When the shadow log looks right, enforce in two stages, not one. Don’t jump straight to default-deny. First,balanced. This is the recommended first enforcing posture:
the firewall default verdict is audit, but the most destructive actions
(like destructive shell) are denied, and the PII Shield guardrail runs
audit-only — it flags PII without masking it yet. You’re now blocking
the worst thing while still observing the rest.
Turn shadow_mode off on your own policy in the same move so its
deny / cap_cost verdicts go live alongside the baseline.
[shadow]
prefix. A denied tool call returns HTTP 400 firewall_blocked; it’s
skip-retry and costs no model tokens.
Then, tight. Once balanced is quiet, go to default-deny. The
tight level denies by default, denies destructive shell and SSRF
egress, and enforces PII Shield + Secrets Blocker — PII is masked on
the request before the model sees it, and secrets in your requests are
blocked. A blocked prompt returns HTTP 400 guardrail_blocked, costs
no quota, and is skip-retry.
| Stage | Firewall (actions) | Guardrails (text) | What you’re proving |
|---|---|---|---|
permissive | Observe; nothing blocked | flag only | Real traffic shape |
balanced | Default audit; destructive shell denied | PII flagged | Worst case is stopped |
tight | Default-deny; shell + fetch-shaped tools (SSRF) denied | PII masked, secrets blocked | Full zero-trust |
Streaming caveat for PII. Under
tight, PII Shield masks PII on the
request before the model sees it — that’s live. Output-side masking
of a streaming response is not yet live; an output block is enforced
on streaming (the scanner cuts the stream). If you depend on redacting
model output, verify your stage/stream combination in the guardrail
Test tab first. See Guardrails.5. The escape hatch — one-click undo
Every autonomy change is a single transaction that snapshots your prior posture, so you can roll straight back from Firewall → Posture (orPOST /api/workspace/firewall/autonomy/undo/:audit_id). You can also just
re-apply a softer level — drop tight back to balanced, or balanced
back to permissive — at any time.
6. Where each move’s verdicts come from
The rollout never blocks something you didn’t ask for, because each posture maps to an explicit, observable mechanism:| Posture | Mechanism | Outcome |
|---|---|---|
| Observe | Workspace firewall_observe_mode on + guardrail flag | Allow + log gaps / matches |
| Shadow | Per-policy shadow_mode | Real verdict computed, downgraded to audit, logged [shadow] would … |
| Enforce | shadow_mode off + tight/balanced autonomy | deny / mask / cap_cost go live |
audit verdict, the flag action, and
shadow_mode — are distinct switches, documented side by side in
Enforcement modes.
7. Next steps
Enforcement modes
The mechanism map behind observe, shadow, and enforce.
Secure Agents baseline
What each autonomy level sets, and how to simulate it first.
Tame an autonomous agent
The next step once you’ve enforced: cost caps, anomaly detection, and
approvals.
Agent Firewall
Policies, rules, verdicts, shadow mode, and the MCP gateway in full.
