Combine firewall and guardrails — the Secure Agents baseline

Most teams reach for agent security too late and one plane at a time — a regex on prompts here, a tool deny-list there. The result is a posture with holes: text screening that never sees a shell.exec, or a tool firewall that never notices a credit-card number leaving in a prompt. The fastest way to a complete agent security baseline is to set both planes at once. OrcaRouter’s autonomy control — the Secure Agents baseline — does exactly that: a single workspace-level switch that writes a firewall policy and a guardrail together, in one transaction, with one-click undo. You don’t author a rule to get protected; you pick a level and tune later.

The two planes are complementary, not redundant. Guardrails screen the prompt/response text (PII, secrets, jailbreak and injection intent); the firewall governs the actions an agent takes (which tools, MCP calls, and hosts). Either alone leaves a gap the other closes — see Guardrails vs. Firewall.

1. Why one baseline beats two half-measures

A real agent run crosses both planes in a single request. The model reads a prompt (text), decides to call db.query (action), and the tool’s result feeds back into the next turn (text again). Securing only one plane leaves the other unguarded:

Firewall only

You deny destructive shell, but a prompt still carries a customer’s SSN straight to the model — and a tool argument still leaks an API key.

Guardrails only

You mask PII in prompts, but the agent still calls rm -rf, reaches a cloud-metadata endpoint, or loops on a runaway tool.

The autonomy control removes the choice. One level sets a coherent posture across both planes, so there’s no window where one is configured and the other isn’t.

2. The agent security baseline: three levels

Each level covers the same two planes. Pick one; it is your floor, and you add precision with rules later.

Level	Firewall	Guardrails	Observe mode
`tight`	Default-deny; destructive shell + fetch-shaped tools denied	PII Shield + Secrets Blocker enforced	Off
`balanced`	Default-audit; destructive shell denied	PII Shield in audit-only (flags PII)	Off
`permissive`	No enforcing policy	None	On — logs every call as a gap

A few specifics worth pinning down, since they shape what each level actually catches:

What `tight` denies on the action plane

tight stamps the firewall policy’s default verdict to deny, then layers deny rules for the shell/exec tool names that carry destructive commands — shell.*, bash, cmd, powershell, exec — and for the fetch-shaped tool names that carry SSRF — http_fetch, web_search, fetch_url, request (and their <server>.* MCP-namespaced variants). It denies these tool names; it does not ship a CIDR or cloud-metadata egress rule. If you want to deny 169.254.169.254 or RFC-1918 ranges by destination, author your own egress rule — see Egress control.

What `tight` enforces on the content plane

Both the PII Shield and Secrets Blocker guardrails are active and enforcing. PII Shield masks PII on the request before it reaches the model; Secrets Blocker catches credentials in the request. Secrets in tool arguments are caught by this guardrail on the request — the firewall does not strip them by default.

Why `balanced` is the recommended start

balanced audits everything (default verdict audit) so you see your agent’s real behavior, while still denying the single most destructive class — destructive shell. PII Shield runs in audit-only mode (flags PII, doesn’t block). You get a full trail with almost no risk of an unexpected block, then tighten from visibility rather than guesswork.

permissive enforces nothing — it exists to observe a brand-new agent with zero risk of accidental blocks. Observe mode stays on, so every tool call is still logged as a coverage gap (visible in Discovered Tools). Use it to learn an agent’s shape, then move to balanced or tight.

3. One concrete example: apply `balanced`, watch both feeds

Applying a level is a single console action (Firewall → Posture) or one API call. The route runs under your session and requires Developer+.

# Configure in the console, or POST under your session token (Developer+):
POST /api/workspace/firewall/autonomy
Content-Type: application/json

{ "level": "balanced" }

The response carries an audit_id — keep it; it’s what you pass to undo. Once applied, the baseline is live on the next tool call. No redeploy, no agent-code change. Now you watch both planes at once:

Firewall → Events — every tool-call verdict (audit, the denied destructive-shell calls). See Events log.
Guardrails → Matches — every content-policy hit (PII Shield flags).

Because balanced writes a real, editable firewall policy and a real guardrail (each named for the level), you can open either afterward and tune it — the baseline is a starting point, not a locked preset.

Preview before you commit. GET /api/workspace/firewall/simulate?level=tight (Member, read-only) shows exactly what tight would change against your current state — nothing is applied. Run it after a day or two on balanced to confirm tight won’t deny calls that are part of your normal traffic.

4. Undo is one call

Every autonomy change is reversible from its audit snapshot, restoring the exact prior state — policies, rules, guardrails, and settings — not a generic reset.

# Developer+; :audit_id is the value returned when you applied the level.
POST /api/workspace/firewall/autonomy/undo/:audit_id

For a very large workspace whose snapshot exceeds the audit-log size limit, the apply still succeeds but one-click undo is unavailable for that change — you re-apply the level you want instead. This is rare, but worth knowing before you tighten a busy production workspace.

5. The recommended path

Start broad, watch, then tighten from a position of visibility:

Apply balanced

Full audit trail; only destructive shell is denied; PII is flagged. Run your agents normally for a day or two.

Simulate tight

GET /api/workspace/firewall/simulate?level=tight and compare its denies against what the Events feed actually showed. If fetch-shaped or destructive-shell calls are part of your normal flow, fix the agent first.

Apply tight

Once simulate holds no surprises, switch to tight. Undo is one call away if production breaks.

Tune with rules

The baseline is your floor. Carve exceptions or add controls it doesn’t cover with firewall rules and named guardrails. Attach a specific policy or guardrail to an individual key for finer scope.

6. Roles for the combined baseline

The autonomy control spans both planes, but every action is role-gated.

Action	Minimum role
Simulate a level / view guardrail Matches / view Discovered Tools	Member
View firewall Events & Runs	Developer+
Apply an autonomy level	Developer+
Undo an autonomy change	Developer+

All configuration runs in the console under your session (/api/workspace/firewall/* and /api/guardrail/*). Only /v1/* relay calls use an sk-orca-… key; the gateway-key routes are a separate scope. See Scope: keys, policies, workspaces.

7. After the baseline: where to tune each plane

The baseline gets you protected in the first 30 minutes. From there, each plane has its own reference for precision work:

Firewall overview

Verdicts, surfaces, argument predicates, approvals — the action plane.

Guardrails

Keyword, regex, PII, llm_judge, and grounding rules — the content plane.

Shadow mode

Roll a tightened firewall policy out in audit-only before enforcing.

Secure Agents baseline

The concept page for the autonomy control and its undo semantics.

The baseline is the floor that closes both planes at once; rules are how you raise the ceiling. See Securing AI agents and the control stack for how the layers compose, and Excessive agency for the threat this baseline most directly answers.

​1. Why one baseline beats two half-measures

Firewall only

Guardrails only

​2. The agent security baseline: three levels

​3. One concrete example: apply balanced, watch both feeds

​4. Undo is one call

​5. The recommended path

​6. Roles for the combined baseline

​7. After the baseline: where to tune each plane

Firewall overview

Guardrails

Shadow mode

Secure Agents baseline

1. Why one baseline beats two half-measures

2. The agent security baseline: three levels

3. One concrete example: apply `balanced`, watch both feeds

4. Undo is one call

5. The recommended path

6. Roles for the combined baseline

7. After the baseline: where to tune each plane