Guardrails vs. Agent Firewall — when to use which

The short answer: Guardrails govern text; the Firewall governs actions. They are complementary — a single request flows through both — and the fastest way to configure them together is an autonomy level. The rest of this page is for the cases where you need to know which layer owns a specific threat.

Role required. Any workspace member can read policies and the guardrail Matches feed; the firewall Events feed requires the Developer role. Creating or editing guardrails or firewall policies also requires Developer or above.

1. The one-line distinction

Layer	Governs	Sees
Guardrails	Text — what the model reads and writes	Prompt content, response content
Agent Firewall	Actions — what the agent does	Tool calls, MCP dispatches, outbound network destinations

Guardrails fire before the upstream call (on the prompt) and after it (on the response). The Firewall fires on every tool call the model emits or that the agent issues — regardless of the model or provider that served the turn.

2. Side-by-side comparison

Dimension	Guardrails	Agent Firewall
Governs	Prompt text and model response text	Tool calls, MCP dispatches, egress destinations, agent cost
Sees	The user message, system prompt, and the model’s reply	Tool name, call arguments, the tool calls the model emits, outbound host/IP
Attaches via	`guardrail_id` on the API key	`firewall_policy_id` on the API key
Rule types	`keyword`, `regex`, `pii`, `max_chars`, `external`, `llm_judge`, `grounding`	Tool-name glob + argument clauses + egress scope + skill ownership
Example threats	PII in prompts, API secrets in responses, jailbreaks, off-topic output, oversized context	Dangerous tool call, SSRF, data exfiltration, runaway agent cost loop, unapproved MCP server
Verdicts / actions	`block` (HTTP 400 `guardrail_blocked`), `mask`, `flag`, `annotate`, `spotlight`	`allow`, `audit`, `deny` (HTTP 400 `firewall_blocked`), `sanitize`, `pending_approval`, `cap_cost`
When it fires	Input stage: before the model call; output stage: after the model replies	On every tool call the model emits or the agent issues
Shadow / observe mode	No — guardrails fire or they don’t	Yes — shadow mode downgrades enforcing verdicts to `audit` for safe rollout

3. Threat → which layer

Use this table to route a new security requirement to the right control:

Threat	Reach for
PII in a user message	Guardrails — input `pii` rule (`mask` / `block`)
Secret in the model’s response	Guardrails — output secrets rule
Dangerous tool call (`shell.exec rm -rf /`)	Firewall — `deny` on tool glob + argument clause
SSRF / data exfiltration via outbound URL	Firewall — egress allow/deny list
Prompt injection from untrusted content	Both — input guardrail + firewall allow-list
Secret in a tool argument	Firewall `sanitize` + Guardrails secrets rule
Jailbreak / policy bypass	Guardrails — `llm_judge` / keyword / regex
Oversized prompt or token cost	Guardrails — `max_chars` rule
Runaway agent spend (cost loop)	Firewall — `cap_cost` verdict
Unapproved MCP server	Firewall — MCP surface deny / `pending_approval`
Sensitive data from a tool result	Guardrails — output rule on the response

The deep “why” for each pairing lives on the Threats deep-dive pages.

4. Use both — autonomy levels set them together

Guardrails and the Firewall are designed to compose, not compete. A single request passes through both planes:

Input guardrail runs — prompt text is screened and optionally masked.
Model call — the (possibly sanitized) prompt reaches the upstream model.
Firewall — every tool call the model emits is evaluated.
Output guardrail runs — the model’s response text is screened.

The fastest way to configure both at once is an autonomy level — a single setting that atomically writes a Firewall policy and a Guardrails policy for the whole workspace, with one-click undo:

Autonomy level	Firewall posture	Guardrails posture
`tight`	Default-deny; block destructive shell + SSRF egress	PII Shield + Secrets Blocker on
`balanced`	Default audit; deny destructive shell	PII Shield audit-only (flags PII)
`permissive`	No enforcing rules; observe mode on	No enforcement

Apply an autonomy level from the Firewall console (POST /api/workspace/firewall/autonomy, Developer+), then tune each plane independently from there.

5. Summary

Guardrails own the text; the Firewall owns the actions — run both, let the autonomy level wire them together, and tighten each plane independently once you can see your agents’ real traffic.

Guardrails

Rule types, PII detection, LLM judge, eval harness, and API reference.

Agent Firewall

Verdicts, surfaces, autonomy levels, HITL approval, and API reference.

Enforcement modes Scope & keys

​1. The one-line distinction

​2. Side-by-side comparison

​3. Threat → which layer

​4. Use both — autonomy levels set them together

​5. Summary

Guardrails

Agent Firewall

1. The one-line distinction

2. Side-by-side comparison

3. Threat → which layer

4. Use both — autonomy levels set them together

5. Summary