block, mask,
flag, annotate, and spotlight. This page covers the three
enforcement choices you reach for first: block, mask, and flag. Pick
one per rule (or, for a PII rule, route different entities to different
actions; see §5). The other two are
prompt-shaping, non-blocking actions: annotate injects a security note
upstream (see code security), and
spotlight wraps matched untrusted input in delimiters so the model
treats it as data, not instructions. The full roster lives in the
Guardrails overview.
For the wider engine — rule types, stages, attaching a policy to a key —
start at the Guardrails overview or the
full Guardrails reference.
1. The guardrail block mask flag decision in one line
block
Reject the call with HTTP 400
guardrail_blocked. The model never
runs (input stage) or its answer never returns (output stage).mask
Redact each match — e.g.
jane@acme.com → [EMAIL] — and let the
sanitized text through. The request continues.flag
Change nothing about the traffic. Record a match in the feed and
move on. Observe-only.
These are the three enforcement actions. Whichever you set is
honored everywhere the rule runs — the console rule builder, the
Test sandbox, and the live
/v1/*
relay path all read the same block / mask / flag value.2. One concrete example — three rules, three actions
Here’s a single guardrail whose three rules each pick a different action. You author this in the console (/console/guardrails) on your session —
the sk-orca-... relay key is only for /v1/* calls, never for editing
policy. Creating or editing a guardrail requires the Developer+ role.
- The block rule rejects any prompt containing one of those literal terms — HTTP 400, the model never runs.
- The mask rule rewrites emails and phone numbers to
[EMAIL]/[PHONE]in the prompt before the model sees it. - The flag rule watches the model’s output for a confidential marker and records a match without altering the response — so you can measure how often it appears before deciding to enforce.
3. block — reject with HTTP 400
Ablock action rejects the whole call. The caller gets HTTP 400 with
error code guardrail_blocked and a message naming the guardrail and the
rule that fired.
No quota is charged
No quota is charged
An input-stage block fires before metering, so nothing is
consumed. An output-stage block refunds the pre-consumed quota
after rejecting the answer. Either way the caller pays nothing for a
blocked call.
It's marked skip-retry
It's marked skip-retry
A
guardrail_blocked result is skip-retry — re-running the same
prompt against another channel would just block again, so the gateway
won’t waste a retry. See
the guardrail_blocked error.It's enforced on streaming too
It's enforced on streaming too
On a non-streaming response the answer is screened before it
returns. On a streaming response a scanner cuts the stream
mid-flight and emits a replacement message before any blocked content
reaches the client. See
streaming coverage.
block when a match means the request must not proceed —
secrets in a prompt, a jailbreak attempt, a hard compliance line.
4. mask — redact and continue
Amask action redacts each match and lets the request through with
the sanitized text. The upstream model never sees the original. On a
PII rule, each match is replaced with a typed tag derived from the
entity — an email becomes [EMAIL], an SSN becomes [SSN], a credit card
[CREDIT_CARD], and so on. (You can override the replacement string per
custom entity; see masking formats.)
Input-stage masking is live on every stream. It rewrites the request
before the model runs, streaming or not. Output-stage masking applies to
non-streaming responses only — the masked text is forwarded after the
full answer is screened. On a streaming response the gateway computes
the mask but does not yet forward the redacted text, so a mask rule does
not redact a streaming reply today; in-stream output masking is on the
roadmap. (An output block still cuts a stream mid-flight — see §3.)
Prove your exact stage/stream combination in the sandbox first. See
streaming coverage.
mask when the content is fine but a substring shouldn’t
reach the model — PII redaction is the canonical case. The turnkey
starting point is the PII Shield preset; see
PII Shield.
5. flag — log only, change nothing
Aflag action is observe-only: the request is byte-identical to one
with no rule at all, except a match is recorded in the
Matches feed. Nothing is blocked,
nothing is redacted.
A flagged match records the rule type, action, stage, and a detail string
— and the matched substring only if Log raw content is on for that
guardrail (off by default, the privacy-conservative posture). See
logging & privacy.
6. Per-entity action overrides
A single PII rule can route different entities to different actions viaentity_actions, instead of stacking overlapping rules. Each override
value must be one of block / mask / flag / annotate, and must
reference an entity the rule already declares — the validator rejects
anything else.
7. Picking the right action
| If you want to… | Use | Effect |
|---|---|---|
| Stop the request entirely | block | HTTP 400, no quota, skip-retry |
| Strip a substring, keep the call | mask | Redacted text forwarded |
| Watch without touching traffic | flag | Match recorded only |
Actions compose with stages. The same action behaves slightly
differently on input vs output — an input block saves quota up front; an
output block refunds it; output masking applies to non-streaming
responses only, while an output block cuts streaming and non-streaming
responses alike. Read
input stage and
output stage alongside this page.
8. Where to go next
The guardrail_blocked error
What a 400 looks like, why it costs no quota, and how skip-retry works.
Masking formats
Typed tags, custom replacement strings, and what a masked prompt reads
like to the model.
Streaming coverage
Exactly which action × stage × stream combinations are enforced today.
Enforcement modes
How block / mask / flag map onto the gateway’s broader enforcement
model, including the firewall’s audit verdict.
