1. The three postures at a glance
| Posture | What happens to traffic | Mechanism | When to use |
|---|---|---|---|
| Observe | All traffic is allowed; calls with no policy are logged as coverage gaps | Workspace-level observe mode on; guardrail rules use flag action; firewall default_verdict is audit | Baseline discovery — understand what your agents actually do before you write a single rule |
| Shadow | Traffic is allowed; a policy evaluates and would-be blocks are logged as [shadow] would … | Per-policy shadow_mode flag on the firewall policy | Safe pre-production validation — confirm a policy fires correctly before it touches traffic |
| Enforce | Real verdicts apply — deny blocks, sanitize redacts, pending_approval holds | Shadow mode off; guardrail actions set to block / mask; firewall verdicts are live | Production enforcement after you’ve verified the policy in shadow |
Role requirement. Any workspace member can read policies, settings,
and the discovered-tools view; the firewall Events and Runs feeds
require the Developer role. Changing settings, policy actions, or
shadow_mode also requires Developer or higher.2. Observe posture — measure before you rule
The observe posture is not a single switch. It is a combination of three independent mechanisms that together produce “allow everything, record everything”:Firewall observe mode (workspace setting)
When a tool call resolves to no policy at all — no key attachment and no workspace default — the firewall’s workspace-level observe mode determines what happens:- Observe mode on: the call is allowed and logged as a coverage gap. The Discovered Tools view fills up from these gap events, showing exactly which tools your agents are running without any rule covering them.
- Observe mode off: the call is allowed silently — byte-identical to a workspace that never enabled the feature.
Firewall audit verdict (per-policy default)
When a policy does resolve but no rule matches a tool call, the
policy’s default_verdict applies. The default value for
default_verdict is audit — allow the call and record it for
review. A new policy with no rules and no configuration changes blocks
nothing and silently allows nothing: it audits everything it sees.
audit is also a normal rule verdict. A rule that matches and produces
audit lets the call through and records it — the guardrail-audit-mode
analogue for the firewall.
Guardrail flag action (per-rule action)
On the guardrails side, the flag action is the observe equivalent: the
rule fires, a match is recorded in the Matches feed, and the request
continues unchanged. No block. No redaction. Use flag when you want to
measure a rule — see how often it fires and on what — before committing
to block or mask.
Together, these three produce the observe posture: observe mode catches uncovered tool calls;
audit verdicts cover tool calls under a policy
but not yet under a specific rule; flag actions cover guardrail checks
you are not yet ready to enforce.
3. Shadow posture — validate before you enforce
Shadow mode is a per-policy flag (shadow_mode: true) on a firewall
policy. When it is on:
- The policy evaluates every tool call exactly as it would in production — rules are matched, verdicts are computed, argument predicates are tested.
- Every enforcing verdict (
deny,sanitize,pending_approval) is downgraded toauditbefore it reaches the tool. - The logged reason is prefixed
[shadow] would …so you can see in the events feed exactly what would have been blocked, sanitized, or held.
Guardrails have no
shadow_mode equivalent at the policy level — use the
flag action per-rule to observe individual guardrail checks before
switching to block or mask.4. Enforce posture — real verdicts, real consequences
In the enforce posture, nothing is downgraded:- Firewall
deny→ the agent sees a tool error (MCP) or HTTP 400firewall_blocked(inbound surface). The error names the tool and the reason. Marked skip-retry. - Firewall
sanitize→ matched substrings are redacted from the tool arguments and the cleaned call is forwarded. - Firewall
pending_approval→ the call is held; the agent receives HTTP 400firewall_approval_pendingand an approval id to poll. - Guardrail
block→ HTTP 400guardrail_blocked, naming the guardrail and the rule that fired. Costs no quota. - Guardrail
mask→ the match is redacted (e.g.jane@acme.com→[EMAIL]) and the request continues with the sanitized text.
shadow_mode on the firewall
policy, and change guardrail rule actions from flag to block or
mask as appropriate.
5. Recommended rollout
Observe — discover what your agents do
Turn on workspace observe mode (
PUT /api/workspace/firewall/settings, firewall_observe_mode: true). Leave
the firewall with no policy (or a policy whose default_verdict is
audit). Add flag actions to any guardrail rules you want to
measure.Watch the Discovered Tools view fill up with every tool call your
agents make, flagged covered or gap. Use this as the input
for writing your first policy rules — you are writing rules for real
traffic, not hypothetical traffic.Let this run until the Discovered Tools view stabilizes and you have
enough data to write intentional rules.Shadow — validate before enforcement
Author a firewall policy with
shadow_mode: true. Attach it to the
keys you want to govern (or set it as the workspace default). For
guardrails, keep rule actions as flag at this stage.The policy now evaluates every real tool call and logs what it would
do. Open the Events and Runs views and filter by [shadow]
prefix. Confirm:- It fires on the tools and argument patterns you intended.
- It does not fire on anything you want to allow (false positives).
Enforce — flip the switch
Set
shadow_mode: false on the policy. For any guardrail rules you
were observing with flag, change the action to block or mask
as appropriate.Monitor the Events feed for unexpected blocks in the first
hour. The Undo action on the autonomy audit log lets you
restore the prior state in one click if you need to roll back.6. Autonomy levels — set all of it at once
Tuning policies rule-by-rule is the precise path. Autonomy levels are the fast one — a single control that atomically sets your workspace’s Firewall and Guardrails posture in one transaction, with one-click undo:| Level | Posture produced |
|---|---|
permissive | Observe posture: no enforcing policy, no guardrails, workspace observe mode on — you see everything, nothing is blocked. Maps to the Observe step above. |
balanced | Default verdict audit, but destructive shell is denied; PII Shield runs in audit-only mode (flags PII); observe mode off. The recommended starting posture once you know your traffic shape. |
tight | Full enforce: default-deny, with destructive shell and SSRF egress denied; PII Shield + Secrets Blocker guardrails enforced (screen requests for PII and secrets); observe mode off. |
POST /api/workspace/firewall/autonomy (Developer+). The
Simulate endpoint (GET /api/workspace/firewall/simulate?level=)
previews what a level change would do before you apply it.
Autonomy levels are a convenience layer over the same mechanisms
described above — they set
default_verdict, observe mode, the firewall
rules, and guardrail rule actions. They do not toggle shadow_mode;
that stays a manual per-policy control. You can always override
individual settings after applying a level.7. Mechanism map — which setting does what
This table is the authoritative reference. The four terms are distinct — do not conflate them:| Term | Kind | What it controls |
|---|---|---|
| Observe mode | Workspace setting | Behavior when a tool call resolves to no policy. On → log as gap (Discovered Tools). Off → silent allow. |
audit verdict | Policy / rule verdict | Behavior for a tool call under a policy that matches (or falls to default). Allow + record. The default default_verdict. |
flag action | Guardrail rule action | Guardrail check allows traffic and records a match. The observe-without-enforce action for guardrails. |
shadow_mode | Per-firewall-policy flag | Downgrade all enforcing verdicts (deny/sanitize/pending_approval) to audit and prefix reason with [shadow] would …. |
Secure Agents Baseline
The recommended starting posture and five-minute setup for zero-trust
agent security.
Agent Firewall
Full reference for policies, rules, verdicts, shadow mode, and the
MCP gateway.
