output. The gateway runs it after the upstream model
responds and before a single byte reaches your client.
This page covers the output stage specifically: how a completion is
screened, what a block costs, and how block and mask each behave on
streaming responses. For the full engine — every rule type, field, and
route — see Guardrails.
1. Why output guardrails llm teams reach for
The model is the untrusted part of the loop. It can echo a secret from the prompt, pull a customer’s email out of RAG context, or hallucinate a claim your sources never made. None of that is visible at the input stage, because none of it exists until the model has answered. An output-stage guardrail is the screen on the completion itself. A rule runs at the output stage when itsstage is output (or both).
The gateway evaluates the model’s response text against the policy, records
any match, and then either lets it through, redacts it, or rejects it —
exactly the same block / mask / flag actions you use on input, just
applied to the reply.
Output rules are a superset concern, not a replacement. Most policies
screen
input to keep data out of the prompt and output to catch what
the model returns. Stage both attaches one rule to both ends.2. One concrete example — block a secret in the reply
Create a guardrail in the console (/console/guardrails), add one rule,
and attach it to a key:
- Type: Secrets / regex detector
- Stage:
output - Action:
block
/v1/*
traffic:
guardrail_blocked — the client never sees the
leaked content. If it’s clean, the response passes through untouched.
3. What an output block costs
Unlike an input block — which fires before the request is metered — an output block happens after the upstream model has already run. The gateway handles the accounting for you:- A blocked completion still returns HTTP 400
guardrail_blockedwith a message naming the guardrail and the rule that fired. - No quota is charged. The output block refunds the pre-consumed quota after the response is rejected, so the failed call is free to you even though the model produced tokens.
- The request is marked skip-retry — re-running the same prompt would just block again, so the gateway won’t burn a retry on another channel.
This is the key difference from the input stage. An input block is free
because metering hasn’t started; an output block is free because the
pre-consumed quota is refunded once the reply is rejected. Either way
the caller pays nothing. See
the guardrail_blocked error.
4. Streaming — block vs. mask
Block is enforced on streaming responses; output mask is not yet. Here is how each behaves:block — enforced on streaming AND non-streaming
block — enforced on streaming AND non-streaming
On a non-streaming response, the completion is screened in full
before it returns. On a streaming response, a scanner watches the
deltas as they flow; when a block rule fires mid-stream it cuts the
stream — the scanner seals, emits a short replacement notice in place
of the rest, and the SSE channel closes before any blocked content
reaches the client.Already-flushed bytes can’t be retracted, so a block is best-effort on
what has already streamed but reliably stops everything after the
match. For a hard guarantee that no offending byte is ever sent, use a
non-streaming request.
mask — non-streaming only (in-stream masking on the roadmap)
mask — non-streaming only (in-stream masking on the roadmap)
On a non-streaming response, a mask rule rewrites the completion —
e.g. an email in the reply becomes
[EMAIL] — and the sanitized text
is what your client receives.On a streaming response, an output mask rule does not redact the
reply today. The scanner still evaluates each delta and will act on a
block decision, but the masked text it computes is not forwarded — the
raw deltas flow through unchanged. In-band streaming output rewriting is
on the roadmap. Until it ships, send the request non-streaming if
you need an output mask to actually redact the reply.Action on output | Non-streaming | Streaming |
|---|---|---|
block | rejects the reply | cuts the stream |
mask | redacts the reply | not redacted yet (roadmap) |
flag | records only | records only |
5. Grounding — an output-stage faithfulness check
One advanced rule is output-shaped by nature: contextual grounding. Agrounding rule scores the model’s answer against the sources retrieved
on the request (your RAG context) and fires when faithfulness falls below
a threshold (default 0.7). Pair it with block to refuse unfaithful
answers, or flag to measure drift before you enforce. It bills as a judge
sub-line, like any model-backed rule. Full fields live in
Guardrails.
6. PII Shield at the output stage
The PII Shield preset is a singlepii rule, action mask, stage
both. At the input stage it’s fully live — it rewrites the request
before the model, on streaming and non-streaming alike. At the output
stage it masks non-streaming completions, as in
§4; on a streaming response the output mask
does not redact the reply today (in-stream output masking is on the roadmap).
So at the output stage, call non-streaming if you need PII Shield to
actually redact the reply. See
PII Shield and
masking formats.
7. Seeing what fired
Every output rule that fires records a match — its rule type, action, stage (output), and a detail string — in the workspace Matches feed
(GET /api/guardrail/match, open to any Member).
The matched substring is recorded only when the guardrail’s Log raw
content toggle is on; it’s off by default (the privacy-conservative
posture), so by default you see that an output rule fired, not the
sensitive text it caught. A false positive is marked with
POST /api/guardrail/match/:id/mark-fp (Admin) — treat it as a tuning
signal, not a reason to disable the rule.
8. Where to go next
Input stage
The mirror image — screen the request before the model sees it. Input
masking is fully live, including streaming.
Actions
block, mask, and flag in depth — when each one is the right call.
Streaming coverage
The full matrix of what’s enforced on streaming vs. non-streaming.
guardrail_blocked error
The HTTP 400, the quota refund, and skip-retry behavior.
Related concepts
Related concepts
Threats this addresses
Threats this addresses
Full engine reference
Full engine reference
Guardrails — every rule type, field, and route,
including grounding and the LLM judge.
