deny, sanitize, [EMAIL]. This page is the lookup
table for those words: what each one means, what it does to the call, and where
to go for the full mechanics. Keep it open while you author rules or triage the
events feed.
Two control planes produce two vocabularies. The Firewall
governs tool actions and emits a verdict. Guardrails
screen prompt and response text and emit an action plus, on a mask, a
typed masking tag. They never share a word — a guardrail never says deny,
a firewall never says mask.
This is a reference index, not a how-to. For the use-case behind each control
see Guardrails vs Firewall; for the
HTTP bodies see Security error codes.
1. The firewall verdict glossary
A firewall rule (or the policy’sdefault_verdict) resolves every tool call to
exactly one of these six verdicts. The engine walks the rules in priority order,
first match wins, and falls back to the default if nothing matches.
allow — let the call through
allow — let the call through
The call proceeds to the tool. Still logged as a firewall event so it shows
up in Runs and the events feed. This is what you want for tools an agent is
explicitly trusted to use.
audit — allow, but record it for review
audit — allow, but record it for review
Identical traffic to
allow, but flagged as something you wanted to watch.
It is the recommended default_verdict: observe everything, block nothing,
until your rules are tuned. The balanced autonomy level ships the PII
Shield guardrail as flag-only (audit), so PII is recorded without holding
the call.deny — block the call
deny — block the call
The call never reaches the tool. On the
inbound surface this returns
HTTP 400 firewall_blocked; through the MCP gateway it comes back as a
tool error (firewall deny: <reason>) so the model can react instead of
crashing. Marked skip-retry. Costs no model tokens.sanitize — redact the arguments, forward the cleaned call
sanitize — redact the arguments, forward the cleaned call
Replaces matched substrings (secrets, PII) in the tool-call arguments
with a
[redacted:<preset>] token, then forwards the call with the cleaned
arguments. It redacts arguments only — never the content a tool returns.
On the inbound surface, where
there are no call-time arguments yet, sanitize escalates to a deny.pending_approval — hold for a human
pending_approval — hold for a human
The call is enqueued for review and the agent gets a held response carrying
an approval id (HTTP 400
firewall_approval_pending). A reviewer
resolves it in the console or via an HMAC webhook callback; the agent polls
the id and re-submits once with a single-use approval header. See
Human approval.cap_cost — deny once the run overspends
cap_cost — deny once the run overspends
Authored as a rule with a per-rule cents ceiling. It resolves to
allow
while the agent run is under budget and to deny once accumulated spend
crosses the cap — so an event shows allow or deny, not the literal word
cap_cost. A circuit-breaker for runaway loops.Default verdict
default_verdict accepts only the three non-interactive verdicts:
| Value | Meaning when no rule matches |
|---|---|
allow | Permit uncovered tool calls silently. |
audit | Permit but record — the default. |
deny | Block anything no rule explicitly allows (default-deny posture). |
tight autonomy level sets default_verdict: deny; balanced and the
shipped default use audit.
2. Guardrail actions
A guardrail rule fires one of five actions. They are the text-plane equivalent of verdicts — and a guardrail rule never produces a firewall verdict.| Action | What it does | Quota |
|---|---|---|
block | Reject the whole request with HTTP 400 guardrail_blocked. | None — input blocks fire before metering; output blocks refund. |
mask | Redact each match to a typed tag (see §3) and forward the cleaned text. | Normal — the call proceeds. |
flag | Log only. Records a match; changes nothing about the traffic. | Normal. |
annotate | Non-blocking. Attaches a human-readable note to the request (injected upstream as a security notice) without masking or blocking the text. | Normal. |
spotlight | Non-blocking. Wraps the matched (untrusted) text in delimiters and tells the model to treat the delimited region as data, never instructions — the prompt-injection “spotlighting” defense. | Normal. |
pii rule can apply different actions to different entities with
entity_actions — mask emails and phones, but block on credit_card and
ssn, from one rule. Keys must be an entity enabled on the rule; values must be
block / mask / flag / annotate.
3. The masking tag glossary
On amask action, every matched entity is replaced inline with a typed tag —
[<UPPERCASE_ENTITY_NAME>] — so the model (input stage) or the caller (output
stage) sees the shape of the data without the value. Masking runs on both
stages, including streamed responses: a token-aware stream scanner masks
matches that straddle chunk boundaries before they reach the client.
| Entity | Tag |
|---|---|
email | [EMAIL] |
phone | [PHONE] |
credit_card | [CREDIT_CARD] |
ssn | [SSN] |
ip | [IP] |
iban | [IBAN] |
mac_address | [MAC_ADDRESS] |
jwt | [JWT] |
aws_access_key | [AWS_ACCESS_KEY] |
api_key_openai | [API_KEY_OPENAI] |
bitcoin_address | [BITCOIN_ADDRESS] |
| Entity | Tag | Region |
|---|---|---|
jp_mynumber | [JP_MYNUMBER] | Japan |
kr_rrn | [KR_RRN] | South Korea |
cn_resident_id | [CN_RESIDENT_ID] | China |
Custom entities follow the same convention. A custom entity named
employee_id masks to [EMPLOYEE_ID] unless you set an explicit mask_with
replacement. Up to 25 custom entities per rule, each an RE2 regex with an
optional luhn checksum. See PII detection.4. One worked example
A singledb.query tool call, read top to bottom, touches both vocabularies:
sanitize cleaned the tool arguments; the guardrail mask
cleaned the prompt text; the [EMAIL] tag is what the model sees in place of
the address. Same request, three different layers, three words from this
glossary.
5. Posture words you’ll see alongside verdicts
These aren’t verdicts or actions, but they decide whether a verdict is enforced at all — so they show up in the same events and settings views.| Word | Plane | Meaning |
|---|---|---|
| Shadow mode | Firewall | Per-policy flag. Downgrades every enforcing verdict to audit, prefixes the reason [shadow] would …. |
| Observe mode | Firewall | Workspace setting. When no policy resolves, allows the call but logs it as a coverage gap (Discovered tools). |
| Enforce | Firewall | Shadow off + a policy attached: verdicts take effect. |
| Fail-open | Guardrails | Default for advanced rules (llm_judge, grounding, external) — a timeout is observed, the request continues. Flip to fail-closed per rule. |
| Log raw content | Guardrails | Off by default. When off, a match records that a rule fired but not the matched substring. |
6. Where each word is defined
| Surface | Vocabulary | Home page |
|---|---|---|
| Firewall policy | allow audit deny sanitize pending_approval cap_cost | Firewall |
| Firewall rule matching | tool_name_glob, args_match, egress, sequence | Firewall rules |
| Guardrail rule | block mask flag annotate spotlight | Guardrails |
| Guardrail PII | entity names + masking tags | Guardrails |
| MCP & skills | skill risk bands, quarantine / block modes | Firewall MCP, Firewall skills |
| HTTP error bodies | guardrail_blocked, firewall_blocked, firewall_approval_pending | Error codes |
7. Related reading
Why was this blocked?
Trace a single denied call back to the exact rule and verdict that stopped
it.
Enforcement modes
How audit, shadow, observe, and enforce relate — and how to roll out safely.
Guardrails vs Firewall
Which plane owns which decision, and why a request can pass through both.
Dangerous tool calls
The threat the
deny and sanitize verdicts exist to stop.