1. Layer 1 — Scoped API key
The key is the first gate. Before any content is inspected or any model is called, the gateway resolves the calling key and decides whether the request is even permitted. What the key carries:model_limits— the set of models the key may call. A request for a model outside the list is rejected immediately.allow_ips— optional IP allowlist. A request from an unlisted source is rejected.credit_limit_usd— a hard spend cap. A request that would push the key over the cap is rejected.expiry— a hard expiry date. Expired keys are rejected.environment— a tag (production,staging,dev, …) to organize and identify the key by deployment environment.guardrail_id— the guardrail policy that binds to this key (see Layer 2 and Layer 4).firewall_policy_id— the firewall policy that binds to this key (see Layer 3).is_firewall_gateway— flags the key as a firewall-gateway-scoped token; required for the evaluate and MCP gateway routes.
is_firewall_gateway and plaintext key reads require Admin.
For the full key model see Scope, keys, policies, and workspaces.
2. Layer 2 — Input guardrails
Once the key is validated, the gateway runs the bound guardrail’s input-stage rules against the request text — before any upstream model call. What it sees: The caller’s messages as submitted. (A prompt injected from a registry prompt is appended later, in the routing stage; input rules do not see it.) Rule types available:keyword, regex, pii, max_chars, external, llm_judge, grounding.
Actions a rule can produce:
| Action | What happens |
|---|---|
block | Request is rejected — HTTP 400 guardrail_blocked. No quota is charged. Marked skip-retry. |
mask | Match is redacted (e.g. jane@acme.com → [EMAIL]). The sanitized text continues to the model. |
flag | Match is recorded; traffic is unchanged. |
block at this stage means the model is never called. Cost: zero. The caller sees a structured error naming the guardrail and the rule that fired.
Where to configure: Console → Guardrails, or the guardrail API. Requires Developer+ to create or modify. See Guardrails for the full rule reference.
3. Layer 3 — Model runs
If the key is valid and input guardrails pass, the gateway forwards the request to the upstream model. This is the only layer with no configurable enforcement — it is simply the model doing its job. The firewall operates on the actions the model produces (Layer 3 → Layer 4 below), not on the model itself. Routing, fallbacks, and load balancing happen here transparently.4. Layer 4 — Agent Firewall (tool calls and egress)
After the model responds — or inline, as tool calls are emitted — the Agent Firewall judges every action the model requests. Four enforcement surfaces:| Surface | What the firewall sees |
|---|---|
inbound | Tool definitions the agent advertises to the model. Block a dangerous tool before the model can choose it. |
response | tool_calls the model emits in its reply. |
mcp | A tools/call dispatched through the Firewall MCP gateway or the evaluate hook. |
egress | An outbound network destination (host / IP / CIDR) reported by a tool — the SSRF and data-exfiltration surface. |
| Verdict | What it does |
|---|---|
allow | Call proceeds. Logged. |
audit | Call proceeds; recorded for review. The default default_verdict. |
deny | Call blocked — HTTP 400 firewall_blocked on the inbound surface; a tool error on mcp. |
sanitize | Matched substrings are redacted from tool arguments; the cleaned call proceeds. On inbound (no arguments yet), escalates to deny. |
pending_approval | Call is held; an out-of-band reviewer approves or rejects; the agent re-submits with a single-use approval token. |
cap_cost | Deny once the agent run’s accumulated spend exceeds a per-rule cents cap. |
deny on the inbound surface costs no model tokens — the block fires before the upstream call. A pending_approval hold returns HTTP 400 firewall_approval_pending with an approval id the client polls.
Where to configure: Console → Firewall, or the firewall API. Requires Developer+ to create or modify policies and rules. See Firewall and Firewall rules for the full rule language.
5. Layer 5 — Output guardrails
After the model responds (and after any tool-call cycle completes), the gateway runs the bound guardrail’s output-stage rules against the response text before it reaches the caller. The same rule types and actions apply as in Layer 2. An outputblock returns HTTP 400 guardrail_blocked and refunds the pre-consumed quota — the caller pays nothing.
Streaming and output masking. A
block action is enforced on both streaming and non-streaming responses — on a stream, the scanner cuts mid-flight and emits a replacement. A mask action on output currently applies only to non-streaming responses; on a streaming response the original chunk passes through unmasked. Verify your stage/stream combination in the guardrail sandbox before depending on it.6. Layer 6 — Audit
Every match, verdict, and approval decision is written to the audit trail, correlated to the agent run and session that caused it. This is not a separate enforcement step — it runs in parallel with Layers 2–5 — but it is the layer that makes the others accountable. What is logged:- Guardrail matches: rule type, action, stage, detail string, and (if Log raw content is enabled) the matched substring.
- Firewall events: surface, tool name, verdict, matched rule, reason code, risk factors, and the run/session the call belongs to.
- Approval decisions: who approved or rejected, when, and whether the underlying rule changed between hold and decision.
- Policy changes: every create, update, delete, and autonomy-level change writes a versioned audit row.
7. Summary table
| Layer | What it controls | What it sees | Outcome on a hit | Where to configure |
|---|---|---|---|---|
| 1. Scoped key | Identity, model access, spend, IP, expiry | The request’s auth token | HTTP 4xx before anything runs; no metering | Console → API Keys (Developer+) |
| 2. Input guardrails | Request text content | Caller’s messages | Block (HTTP 400 guardrail_blocked, no charge), mask, or flag | Console → Guardrails (Developer+) |
| 3. Model | — | — | — | Routing / channel config |
| 4. Agent Firewall | Tool calls, MCP dispatch, egress | Tool name, arguments, destination | allow / audit / deny / sanitize / pending_approval / cap_cost | Console → Firewall (Developer+) |
| 5. Output guardrails | Response text content | Model’s reply | Block (HTTP 400, quota refunded), mask, or flag | Console → Guardrails (Developer+) |
| 6. Audit | Attribution and trail | All of the above | Immutable log entry | Console → Matches (Member) / Events & Runs (Developer+) |
8. Policy resolution order
For any request, the active guardrail and firewall policy are resolved independently:- Key attachment — if the key carries an explicit
guardrail_idorfirewall_policy_id, that policy binds (when it exists and is enabled). - Workspace default — if the key has no attachment, the workspace’s enabled
is_defaultguardrail or policy applies. - Neither — no enforcement. The request is byte-identical to a workspace that never enabled the feature.
9. Fail-open and fail-closed
Two behaviors — applied to different cases.Fail-open (transient errors): If policy resolution hits a transient error — a database hiccup, a network blip on the way to an external vendor — the gateway degrades to no enforcement rather than taking traffic down. Safety degrades; availability is preserved. Configure
fail_open: false on external or llm_judge rules when a missed check is unacceptable for your policy.Fail-closed (ambiguous cases): Where not enforcing would defeat the rule, the engine fails closed: an egress report with an unresolvable destination is denied; an unreachable approval store holds the call rather than passing it; a skill whose ownership cannot be resolved is blocked. Availability is preserved on the happy path; safety is not silently skipped on the edge cases that matter.10. Deep dives
Guardrails
Full rule reference — types, actions, PII entities, LLM judge, grounding, and the test sandbox.
Firewall
Policy model, verdicts, surfaces, shadow mode, HITL approval, and anomaly detection.
Firewall rules
The rule matching language — tool globs, argument clauses, egress lists, and sanitizers.
Guardrails vs. Firewall
Which layer catches which threat — and when you need both.
Scope, keys, and policies
The full key model: what a key carries and how policies resolve.
Enforcement modes
Fail-open vs. fail-closed — the full decision tree.
