Skip to main content
The short answer: Guardrails govern text; the Firewall governs actions. They are complementary — a single request flows through both — and the fastest way to configure them together is an autonomy level. The rest of this page is for the cases where you need to know which layer owns a specific threat.
Role required. Any workspace member can read policies and the guardrail Matches feed; the firewall Events feed requires the Developer role. Creating or editing guardrails or firewall policies also requires Developer or above.

1. The one-line distinction

LayerGovernsSees
GuardrailsText — what the model reads and writesPrompt content, response content
Agent FirewallActions — what the agent doesTool calls, MCP dispatches, outbound network destinations
Guardrails fire before the upstream call (on the prompt) and after it (on the response). The Firewall fires on every tool call the model emits or that the agent issues — regardless of the model or provider that served the turn.

2. Side-by-side comparison

DimensionGuardrailsAgent Firewall
GovernsPrompt text and model response textTool calls, MCP dispatches, egress destinations, agent cost
SeesThe user message, system prompt, and the model’s replyTool name, call arguments, the tool calls the model emits, outbound host/IP
Attaches viaguardrail_id on the API keyfirewall_policy_id on the API key
Rule typeskeyword, regex, pii, max_chars, external, llm_judge, groundingTool-name glob + argument clauses + egress scope + skill ownership
Example threatsPII in prompts, API secrets in responses, jailbreaks, off-topic output, oversized contextDangerous tool call, SSRF, data exfiltration, runaway agent cost loop, unapproved MCP server
Verdicts / actionsblock (HTTP 400 guardrail_blocked), mask, flag, annotate, spotlightallow, audit, deny (HTTP 400 firewall_blocked), sanitize, pending_approval, cap_cost
When it firesInput stage: before the model call; output stage: after the model repliesOn every tool call the model emits or the agent issues
Shadow / observe modeNo — guardrails fire or they don’tYes — shadow mode downgrades enforcing verdicts to audit for safe rollout

3. Threat → which layer

Use this table to route a new security requirement to the right control:
ThreatReach for
PII in a user messageGuardrails — input pii rule (mask / block)
Secret in the model’s responseGuardrails — output secrets rule
Dangerous tool call (shell.exec rm -rf /)Firewalldeny on tool glob + argument clause
SSRF / data exfiltration via outbound URLFirewall — egress allow/deny list
Prompt injection from untrusted contentBoth — input guardrail + firewall allow-list
Secret in a tool argumentFirewall sanitize + Guardrails secrets rule
Jailbreak / policy bypassGuardrailsllm_judge / keyword / regex
Oversized prompt or token costGuardrailsmax_chars rule
Runaway agent spend (cost loop)Firewallcap_cost verdict
Unapproved MCP serverFirewall — MCP surface deny / pending_approval
Sensitive data from a tool resultGuardrails — output rule on the response
The deep “why” for each pairing lives on the Threats deep-dive pages.

4. Use both — autonomy levels set them together

Guardrails and the Firewall are designed to compose, not compete. A single request passes through both planes:
  1. Input guardrail runs — prompt text is screened and optionally masked.
  2. Model call — the (possibly sanitized) prompt reaches the upstream model.
  3. Firewall — every tool call the model emits is evaluated.
  4. Output guardrail runs — the model’s response text is screened.
The fastest way to configure both at once is an autonomy level — a single setting that atomically writes a Firewall policy and a Guardrails policy for the whole workspace, with one-click undo:
Autonomy levelFirewall postureGuardrails posture
tightDefault-deny; block destructive shell + SSRF egressPII Shield + Secrets Blocker on
balancedDefault audit; deny destructive shellPII Shield audit-only (flags PII)
permissiveNo enforcing rules; observe mode onNo enforcement
Apply an autonomy level from the Firewall console (POST /api/workspace/firewall/autonomy, Developer+), then tune each plane independently from there.

5. Summary

Guardrails own the text; the Firewall owns the actions — run both, let the autonomy level wire them together, and tighten each plane independently once you can see your agents’ real traffic.

Guardrails

Rule types, PII detection, LLM judge, eval harness, and API reference.

Agent Firewall

Verdicts, surfaces, autonomy levels, HITL approval, and API reference.