Enforcement modes: observe, shadow & enforce

Before a rule blocks production traffic, you want to know it fires on the right things and nothing else. OrcaRouter gives you three postures — observe, shadow, and enforce — that let you roll out incrementally, with visibility at every step and no surprises. This page explains what each posture means mechanically, how to move through them, and how autonomy levels set all of it in one step.

1. The three postures at a glance

Posture	What happens to traffic	Mechanism	When to use
Observe	All traffic is allowed; calls with no policy are logged as coverage gaps	Workspace-level observe mode on; guardrail rules use `flag` action; firewall `default_verdict` is `audit`	Baseline discovery — understand what your agents actually do before you write a single rule
Shadow	Traffic is allowed; a policy evaluates and would-be blocks are logged as `[shadow] would …`	Per-policy `shadow_mode` flag on the firewall policy	Safe pre-production validation — confirm a policy fires correctly before it touches traffic
Enforce	Real verdicts apply — deny blocks, sanitize redacts, pending_approval holds	Shadow mode off; guardrail actions set to `block` / `mask`; firewall verdicts are live	Production enforcement after you’ve verified the policy in shadow

Role requirement. Any workspace member can read policies, settings, and the discovered-tools view; the firewall Events and Runs feeds require the Developer role. Changing settings, policy actions, or shadow_mode also requires Developer or higher.

2. Observe posture — measure before you rule

The observe posture is not a single switch. It is a combination of three independent mechanisms that together produce “allow everything, record everything”:

Firewall observe mode (workspace setting)

When a tool call resolves to no policy at all — no key attachment and no workspace default — the firewall’s workspace-level observe mode determines what happens:

Observe mode on: the call is allowed and logged as a coverage gap. The Discovered Tools view fills up from these gap events, showing exactly which tools your agents are running without any rule covering them.
Observe mode off: the call is allowed silently — byte-identical to a workspace that never enabled the feature.

Observe mode is the gap-detection surface. It only fires when no policy resolves. It is not the same as having a policy set to audit.

Firewall `audit` verdict (per-policy default)

When a policy does resolve but no rule matches a tool call, the policy’s default_verdict applies. The default value for default_verdict is audit — allow the call and record it for review. A new policy with no rules and no configuration changes blocks nothing and silently allows nothing: it audits everything it sees. audit is also a normal rule verdict. A rule that matches and produces audit lets the call through and records it — the guardrail-audit-mode analogue for the firewall.

Guardrail `flag` action (per-rule action)

On the guardrails side, the flag action is the observe equivalent: the rule fires, a match is recorded in the Matches feed, and the request continues unchanged. No block. No redaction. Use flag when you want to measure a rule — see how often it fires and on what — before committing to block or mask.

Together, these three produce the observe posture: observe mode catches uncovered tool calls; audit verdicts cover tool calls under a policy but not yet under a specific rule; flag actions cover guardrail checks you are not yet ready to enforce.

3. Shadow posture — validate before you enforce

Shadow mode is a per-policy flag (shadow_mode: true) on a firewall policy. When it is on:

The policy evaluates every tool call exactly as it would in production — rules are matched, verdicts are computed, argument predicates are tested.
Every enforcing verdict (deny, sanitize, pending_approval) is downgraded to audit before it reaches the tool.
The logged reason is prefixed [shadow] would … so you can see in the events feed exactly what would have been blocked, sanitized, or held.

Shadow mode is your safe-rollout switch. Write a policy, turn shadow on, point real traffic at it, watch the events and runs views for a few hours or days, confirm the policy fires on the right tools and nothing unexpected, then turn shadow off to start enforcing.

Guardrails have no shadow_mode equivalent at the policy level — use the flag action per-rule to observe individual guardrail checks before switching to block or mask.

4. Enforce posture — real verdicts, real consequences

In the enforce posture, nothing is downgraded:

Firewall deny → the agent sees a tool error (MCP) or HTTP 400 firewall_blocked (inbound surface). The error names the tool and the reason. Marked skip-retry.
Firewall sanitize → matched substrings are redacted from the tool arguments and the cleaned call is forwarded.
Firewall pending_approval → the call is held; the agent receives HTTP 400 firewall_approval_pending and an approval id to poll.
Guardrail block → HTTP 400 guardrail_blocked, naming the guardrail and the rule that fired. Costs no quota.
Guardrail mask → the match is redacted (e.g. jane@acme.com → [EMAIL]) and the request continues with the sanitized text.

To reach the enforce posture: turn off shadow_mode on the firewall policy, and change guardrail rule actions from flag to block or mask as appropriate.

5. Recommended rollout

Observe — discover what your agents do

Turn on workspace observe mode (PUT /api/workspace/firewall/settings, firewall_observe_mode: true). Leave the firewall with no policy (or a policy whose default_verdict is audit). Add flag actions to any guardrail rules you want to measure.Watch the Discovered Tools view fill up with every tool call your agents make, flagged covered or gap. Use this as the input for writing your first policy rules — you are writing rules for real traffic, not hypothetical traffic.Let this run until the Discovered Tools view stabilizes and you have enough data to write intentional rules.

Shadow — validate before enforcement

Author a firewall policy with shadow_mode: true. Attach it to the keys you want to govern (or set it as the workspace default). For guardrails, keep rule actions as flag at this stage.The policy now evaluates every real tool call and logs what it would do. Open the Events and Runs views and filter by [shadow] prefix. Confirm:

It fires on the tools and argument patterns you intended.
It does not fire on anything you want to allow (false positives).

Tune rules, re-observe, repeat. When the shadow log looks right, move on.

Enforce — flip the switch

Set shadow_mode: false on the policy. For any guardrail rules you were observing with flag, change the action to block or mask as appropriate.Monitor the Events feed for unexpected blocks in the first hour. The Undo action on the autonomy audit log lets you restore the prior state in one click if you need to roll back.

6. Autonomy levels — set all of it at once

Tuning policies rule-by-rule is the precise path. Autonomy levels are the fast one — a single control that atomically sets your workspace’s Firewall and Guardrails posture in one transaction, with one-click undo:

Level	Posture produced
`permissive`	Observe posture: no enforcing policy, no guardrails, workspace observe mode on — you see everything, nothing is blocked. Maps to the Observe step above.
`balanced`	Default verdict `audit`, but destructive shell is denied; PII Shield runs in audit-only mode (flags PII); observe mode off. The recommended starting posture once you know your traffic shape.
`tight`	Full enforce: default-deny, with destructive shell and SSRF egress denied; PII Shield + Secrets Blocker guardrails enforced (screen requests for PII and secrets); observe mode off.

Apply via POST /api/workspace/firewall/autonomy (Developer+). The Simulate endpoint (GET /api/workspace/firewall/simulate?level=) previews what a level change would do before you apply it.

Autonomy levels are a convenience layer over the same mechanisms described above — they set default_verdict, observe mode, the firewall rules, and guardrail rule actions. They do not toggle shadow_mode; that stays a manual per-policy control. You can always override individual settings after applying a level.

7. Mechanism map — which setting does what

This table is the authoritative reference. The four terms are distinct — do not conflate them:

Term	Kind	What it controls
Observe mode	Workspace setting	Behavior when a tool call resolves to no policy. On → log as gap (Discovered Tools). Off → silent allow.
`audit` verdict	Policy / rule verdict	Behavior for a tool call under a policy that matches (or falls to default). Allow + record. The default `default_verdict`.
`flag` action	Guardrail rule action	Guardrail check allows traffic and records a match. The observe-without-enforce action for guardrails.
`shadow_mode`	Per-firewall-policy flag	Downgrade all enforcing verdicts (deny/sanitize/pending_approval) to `audit` and prefix reason with `[shadow] would …`.

Secure Agents Baseline

The recommended starting posture and five-minute setup for zero-trust agent security.

Agent Firewall

Full reference for policies, rules, verdicts, shadow mode, and the MCP gateway.

Enforcement modes are not a binary on/off. Move through observe → shadow → enforce and your rules are verified on real traffic before they ever block it.

​1. The three postures at a glance

​2. Observe posture — measure before you rule

​Firewall observe mode (workspace setting)

​Firewall audit verdict (per-policy default)

​Guardrail flag action (per-rule action)

​3. Shadow posture — validate before you enforce

​4. Enforce posture — real verdicts, real consequences

​5. Recommended rollout

​6. Autonomy levels — set all of it at once

​7. Mechanism map — which setting does what

Secure Agents Baseline

Agent Firewall

1. The three postures at a glance

2. Observe posture — measure before you rule

Firewall observe mode (workspace setting)

Firewall `audit` verdict (per-policy default)

Guardrail `flag` action (per-rule action)

3. Shadow posture — validate before you enforce

4. Enforce posture — real verdicts, real consequences

5. Recommended rollout

6. Autonomy levels — set all of it at once

7. Mechanism map — which setting does what