shell.exec, or a tool
firewall that never notices a credit-card number leaving in a prompt.
The fastest way to a complete agent security baseline is to set both
planes at once. OrcaRouter’s autonomy control — the Secure Agents
baseline — does exactly that: a single workspace-level switch that writes
a firewall policy and a
guardrail together, in one transaction, with
one-click undo. You don’t author a rule to get protected; you pick a level
and tune later.
The two planes are complementary, not redundant. Guardrails screen the
prompt/response text (PII, secrets, jailbreak and injection intent); the
firewall governs the actions an agent takes (which tools, MCP calls,
and hosts). Either alone leaves a gap the other closes — see
Guardrails vs. Firewall.
1. Why one baseline beats two half-measures
A real agent run crosses both planes in a single request. The model reads a prompt (text), decides to calldb.query (action), and the tool’s
result feeds back into the next turn (text again). Securing only one plane
leaves the other unguarded:
Firewall only
You deny destructive shell, but a prompt still carries a customer’s SSN
straight to the model — and a tool argument still leaks an API key.
Guardrails only
You mask PII in prompts, but the agent still calls
rm -rf, reaches a
cloud-metadata endpoint, or loops on a runaway tool.2. The agent security baseline: three levels
Each level covers the same two planes. Pick one; it is your floor, and you add precision with rules later.| Level | Firewall | Guardrails | Observe mode |
|---|---|---|---|
tight | Default-deny; destructive shell + fetch-shaped tools denied | PII Shield + Secrets Blocker enforced | Off |
balanced | Default-audit; destructive shell denied | PII Shield in audit-only (flags PII) | Off |
permissive | No enforcing policy | None | On — logs every call as a gap |
What `tight` denies on the action plane
What `tight` denies on the action plane
tight stamps the firewall policy’s default verdict to deny, then
layers deny rules for the shell/exec tool names that carry
destructive commands — shell.*, bash, cmd, powershell, exec
— and for the fetch-shaped tool names that carry SSRF —
http_fetch, web_search, fetch_url, request (and their
<server>.* MCP-namespaced variants). It denies these tool names; it
does not ship a CIDR or cloud-metadata egress rule. If you want to
deny 169.254.169.254 or RFC-1918 ranges by destination, author your
own egress rule — see Egress control.What `tight` enforces on the content plane
What `tight` enforces on the content plane
Both the PII Shield and Secrets Blocker guardrails are active
and enforcing. PII Shield masks PII on the request before it reaches the
model; Secrets Blocker catches credentials in the request. Secrets in
tool arguments are caught by this guardrail on the request — the firewall
does not strip them by default.
Why `balanced` is the recommended start
Why `balanced` is the recommended start
balanced audits everything (default verdict audit) so you see your
agent’s real behavior, while still denying the single most destructive
class — destructive shell. PII Shield runs in audit-only mode (flags PII,
doesn’t block). You get a full trail with almost no risk of an unexpected
block, then tighten from visibility rather than guesswork.3. One concrete example: apply balanced, watch both feeds
Applying a level is a single console action (Firewall → Posture) or one
API call. The route runs under your session and requires Developer+.
audit_id — keep it; it’s what you pass to
undo. Once applied, the baseline is live on the next tool call. No
redeploy, no agent-code change. Now you watch both planes at once:
- Firewall → Events — every tool-call verdict (
audit, the denied destructive-shell calls). See Events log. - Guardrails → Matches — every content-policy hit (PII Shield flags).
balanced writes a real, editable firewall policy and a real
guardrail (each named for the level), you can open either afterward and
tune it — the baseline is a starting point, not a locked preset.
4. Undo is one call
Every autonomy change is reversible from its audit snapshot, restoring the exact prior state — policies, rules, guardrails, and settings — not a generic reset.5. The recommended path
Start broad, watch, then tighten from a position of visibility:Apply balanced
Full audit trail; only destructive shell is denied; PII is flagged. Run
your agents normally for a day or two.
Simulate tight
GET /api/workspace/firewall/simulate?level=tight and compare its denies
against what the Events feed actually showed. If fetch-shaped or
destructive-shell calls are part of your normal flow, fix the agent first.Apply tight
Once simulate holds no surprises, switch to
tight. Undo is one call
away if production breaks.Tune with rules
The baseline is your floor. Carve exceptions or add controls it doesn’t
cover with firewall rules and named
guardrails. Attach a specific policy or guardrail
to an individual key for finer scope.
6. Roles for the combined baseline
The autonomy control spans both planes, but every action is role-gated.| Action | Minimum role |
|---|---|
| Simulate a level / view guardrail Matches / view Discovered Tools | Member |
| View firewall Events & Runs | Developer+ |
| Apply an autonomy level | Developer+ |
| Undo an autonomy change | Developer+ |
/api/workspace/firewall/* and /api/guardrail/*). Only /v1/* relay
calls use an sk-orca-… key; the gateway-key routes are a separate scope.
See Scope: keys, policies, workspaces.
7. After the baseline: where to tune each plane
The baseline gets you protected in the first 30 minutes. From there, each plane has its own reference for precision work:Firewall overview
Verdicts, surfaces, argument predicates, approvals — the action plane.
Guardrails
Keyword, regex, PII, llm_judge, and grounding rules — the content plane.
Shadow mode
Roll a tightened firewall policy out in audit-only before enforcing.
Secure Agents baseline
The concept page for the autonomy control and its undo semantics.
