Guardrail actions: block, mask, flag

Every guardrail rule answers three questions — what to look for (a type), where to look (a stage), and what to do about it (an action). This page is about that third choice. A rule’s action is the single most consequential field on it: it decides whether a match stops the request, quietly rewrites it, or just leaves a breadcrumb. The rule builder surfaces five actions in all — block, mask, flag, annotate, and spotlight. This page covers the three enforcement choices you reach for first: block, mask, and flag. Pick one per rule (or, for a PII rule, route different entities to different actions; see §5). The other two are prompt-shaping, non-blocking actions: annotate injects a security note upstream (see code security), and spotlight wraps matched untrusted input in delimiters so the model treats it as data, not instructions. The full roster lives in the Guardrails overview. For the wider engine — rule types, stages, attaching a policy to a key — start at the Guardrails overview or the full Guardrails reference.

1. The guardrail block mask flag decision in one line

block

Reject the call with HTTP 400 guardrail_blocked. The model never runs (input stage) or its answer never returns (output stage).

mask

Redact each match — e.g. jane@acme.com → [EMAIL] — and let the sanitized text through. The request continues.

flag

Change nothing about the traffic. Record a match in the feed and move on. Observe-only.

These are the three enforcement actions. Whichever you set is honored everywhere the rule runs — the console rule builder, the Test sandbox, and the live /v1/* relay path all read the same block / mask / flag value.

2. One concrete example — three rules, three actions

Here’s a single guardrail whose three rules each pick a different action. You author this in the console (/console/guardrails) on your session — the sk-orca-... relay key is only for /v1/* calls, never for editing policy. Creating or editing a guardrail requires the Developer+ role.

{
  "rules": [
    { "type": "keyword", "stage": "input",  "action": "block",
      "keywords": ["internal-only", "do-not-share"] },
    { "type": "pii",     "stage": "input",  "action": "mask",
      "entities": ["email", "phone"] },
    { "type": "regex",   "stage": "output", "action": "flag",
      "pattern": "(?i)acme\\s+confidential" }
  ]
}

What each rule does on a request:

The block rule rejects any prompt containing one of those literal terms — HTTP 400, the model never runs.
The mask rule rewrites emails and phone numbers to [EMAIL] / [PHONE] in the prompt before the model sees it.
The flag rule watches the model’s output for a confidential marker and records a match without altering the response — so you can measure how often it appears before deciding to enforce.

The engine runs every applicable rule and folds the results into one verdict. If any rule blocks, the request is blocked.

3. block — reject with HTTP 400

A block action rejects the whole call. The caller gets HTTP 400 with error code guardrail_blocked and a message naming the guardrail and the rule that fired.

No quota is charged

An input-stage block fires before metering, so nothing is consumed. An output-stage block refunds the pre-consumed quota after rejecting the answer. Either way the caller pays nothing for a blocked call.

It's marked skip-retry

A guardrail_blocked result is skip-retry — re-running the same prompt against another channel would just block again, so the gateway won’t waste a retry. See the guardrail_blocked error.

It's enforced on streaming too

On a non-streaming response the answer is screened before it returns. On a streaming response a scanner cuts the stream mid-flight and emits a replacement message before any blocked content reaches the client. See streaming coverage.

Reach for block when a match means the request must not proceed — secrets in a prompt, a jailbreak attempt, a hard compliance line.

4. mask — redact and continue

A mask action redacts each match and lets the request through with the sanitized text. The upstream model never sees the original. On a PII rule, each match is replaced with a typed tag derived from the entity — an email becomes [EMAIL], an SSN becomes [SSN], a credit card [CREDIT_CARD], and so on. (You can override the replacement string per custom entity; see masking formats.)

Input-stage masking is live on every stream. It rewrites the request before the model runs, streaming or not. Output-stage masking applies to non-streaming responses only — the masked text is forwarded after the full answer is screened. On a streaming response the gateway computes the mask but does not yet forward the redacted text, so a mask rule does not redact a streaming reply today; in-stream output masking is on the roadmap. (An output block still cuts a stream mid-flight — see §3.) Prove your exact stage/stream combination in the sandbox first. See streaming coverage.

Reach for mask when the content is fine but a substring shouldn’t reach the model — PII redaction is the canonical case. The turnkey starting point is the PII Shield preset; see PII Shield.

5. flag — log only, change nothing

A flag action is observe-only: the request is byte-identical to one with no rule at all, except a match is recorded in the Matches feed. Nothing is blocked, nothing is redacted.

flag is how you measure a rule before you enforce it. Ship a new keyword or regex as flag, watch the Matches feed for a few days to see its true-vs-false-positive rate on real traffic, then promote it to mask or block once you trust it. Tuning a noisy pattern with flag on beats discovering the false positives in production with block on. See tune false positives.

A flagged match records the rule type, action, stage, and a detail string — and the matched substring only if Log raw content is on for that guardrail (off by default, the privacy-conservative posture). See logging & privacy.

6. Per-entity action overrides

A single PII rule can route different entities to different actions via entity_actions, instead of stacking overlapping rules. Each override value must be one of block / mask / flag / annotate, and must reference an entity the rule already declares — the validator rejects anything else.

{
  "type": "pii",
  "stage": "input",
  "action": "mask",
  "entities": ["email", "phone", "ip", "credit_card", "ssn"],
  "entity_actions": {
    "credit_card": "block",
    "ssn": "block"
  }
}

This one rule masks emails, phones, and IPs but blocks the request outright on a card number or SSN. See custom PII entities for layering your own detectors under the same override model.

7. Picking the right action

If you want to…	Use	Effect
Stop the request entirely	`block`	HTTP 400, no quota, skip-retry
Strip a substring, keep the call	`mask`	Redacted text forwarded
Watch without touching traffic	`flag`	Match recorded only

Actions compose with stages. The same action behaves slightly differently on input vs output — an input block saves quota up front; an output block refunds it; output masking applies to non-streaming responses only, while an output block cuts streaming and non-streaming responses alike. Read input stage and output stage alongside this page.

8. Where to go next

The guardrail_blocked error

What a 400 looks like, why it costs no quota, and how skip-retry works.

Masking formats

Typed tags, custom replacement strings, and what a masked prompt reads like to the model.

Streaming coverage

Exactly which action × stage × stream combinations are enforced today.

Enforcement modes

How block / mask / flag map onto the gateway’s broader enforcement model, including the firewall’s audit verdict.

The firewall has its own verdict vocabulary (allow, audit, deny, sanitize, and more) for tool policy — distinct from these content actions. See guardrails vs. firewall.

​1. The guardrail block mask flag decision in one line

block

mask

flag

​2. One concrete example — three rules, three actions

​3. block — reject with HTTP 400

​4. mask — redact and continue

​5. flag — log only, change nothing

​6. Per-entity action overrides

​7. Picking the right action

​8. Where to go next

The guardrail_blocked error

Masking formats

Streaming coverage

Enforcement modes

1. The guardrail block mask flag decision in one line

2. One concrete example — three rules, three actions

3. block — reject with HTTP 400

4. mask — redact and continue

5. flag — log only, change nothing

6. Per-entity action overrides

7. Picking the right action

8. Where to go next