1. The streaming llm content filter problem
An output-stage guardrail screens the model’s reply. On a non-streaming request that’s straightforward: the gateway has the full completion before a single byte returns, so it can block, mask, or pass it cleanly. Streaming inverts that. The reply arrives as a sequence of SSE deltas, each forwarded to your client as soon as it lands, so a filter that waits for the end filters nothing. OrcaRouter’s answer is a stream scanner: as output deltas flow, the scanner runs your output-stage rules against the accumulating text and acts the instant a rule fires — not after the stream completes. The action you author decides what “acts” means: ablock cuts the stream and a flag
lets it through. A mask does redact on non-streaming output, but
in-band stream rewriting is on the roadmap — on a stream today the scanner
computes the mask but acts only on the block decision, so a mask rule does
not redact a streamed reply yet.
This caveat only matters for output-stage rules on streaming
requests. Input-stage rules screen the request before the model runs, so
they’re fully live including masking — and any output rule on a
non-streaming request sees the whole reply and behaves normally, including
mask.2. What’s stream-safe today
block — stream-safe (cuts the stream mid-flight)
block — stream-safe (cuts the stream mid-flight)
A
block rule is enforced on streaming and non-streaming output. On
a stream, the scanner watches the deltas; when a block rule fires it
cuts the stream — seals the scanner, emits a short replacement notice
([response truncated by guardrail: … policy violation]) as a final
delta, and closes the SSE channel before any further blocked content
reaches the client. Because the HTTP response status is already committed
to 200 by the time the first delta flushed, a mid-stream block can’t
re-issue a status — it terminates the open stream gracefully. The HTTP
400 guardrail_blocked body is the non-streaming output-block shape.Bytes already flushed to the client can’t be retracted, so a streaming
block is best-effort on what has already streamed but reliably stops
everything after the match. For a hard guarantee that no offending byte
is ever sent — and for the 400 guardrail_blocked body — send the
request non-streaming.mask — non-streaming output only (in-band stream rewriting is on the roadmap)
mask — non-streaming output only (in-band stream rewriting is on the roadmap)
A
mask rule rewrites the match — e.g. an email in the reply becomes
[EMAIL] — on non-streaming output, where the gateway holds the whole
completion and forwards the redacted form to your client.On a streaming output today the scanner computes the mask but does
not forward the masked text — it acts only on the block decision — so
a mask rule does not redact a streamed reply. In-band streaming
output rewriting is on the roadmap. Until it ships, if you need a streamed
reply to never expose the matched text, author the rule as block (it
ends the response on a hit) or send the request non-streaming so the
mask rewrites the full reply.flag — observe-only, never alters traffic
flag — observe-only, never alters traffic
A
flag rule never changes the traffic — it lets the bytes through. On
non-streaming output it records a match in the Matches feed, so
you can measure a rule’s hit rate before you promote it to block. On a
streaming response it stays observe-only and passes the deltas through
untouched; the structured match record is written on the non-streaming
output path. Either way it never blocks or rewrites, so it’s always safe
to leave on.Action on output | Non-streaming | Streaming |
|---|---|---|
block | rejects the reply | cuts the stream |
mask | redacts the reply | not yet — block instead (roadmap) |
flag | records a match | passes through (observe-only) |
3. One concrete example — a stream-safe secret filter
Say your model can surface a credential from RAG context, and your app streams. You want the gateway to kill the stream the moment a secret-shaped match appears, rather than masking it — a leaked secret should end the response, not be partially redacted. Author it in the console — policy editing is a management action on your session, gated to Developer+; the relay key only sends/v1/* traffic:
- Open
/console/guardrails, New guardrail, name itstream-safe-out. - Add one rule:
- Type:
regex(or apiirule with secret entities likeaws_access_key/api_key_openai/jwt) - Stage:
output - Action:
block← ends the response on a secret hit;maskwould redact it instead and let the rest of the reply continue
- Type:
- Save, then attach it on
/console/tokenvia the key’s Guardrail dropdown.
stream: true, exactly as before:
4. PII Shield on a stream
The PII Shield preset is a singlepii rule, action mask, stage
both. At the input stage it’s fully live — it rewrites the request
before the model sees it, streaming or not. At the output stage masking
redacts on non-streaming replies, where the gateway holds the whole
completion before it returns.
On a streaming output the mask does not redact yet — the scanner
computes the mask but acts only on the block decision, so a streamed reply is
passed through, not rewritten. In-band streaming output rewriting is on the
roadmap. So if your goal is that PII is never observable in a streamed reply,
either:
- author the output rule as block, accepting that a hit ends the response rather than redacting it, or
- send the request non-streaming so the mask rewrites the full reply with the whole completion in hand.
5. Prove it before you ship
Don’t guess which stage/action combination holds — verify it.Test tab
Each guardrail editor has a Test tab: paste a sample, pick the
output stage, and run the current policy with no upstream call and no
quota. See the verdict and, for mask rules, the rendered text. Running
the sandbox is a Developer+ action (it can fire paid judge / external
rules).Eval tab
The Eval tab scores a guardrail against bundled or custom JSONL
corpora — useful to confirm a block rule catches a known leak across a
corpus before you attach a key.
6. What a streaming block costs
A streaming block carries the same accounting as any output block — the upstream model has already run, so the gateway handles the refund for you:- The stream is terminated with a graceful truncation delta (status is
already 200); the non-streaming output block returns the HTTP 400
guardrail_blockedbody naming the guardrail and the rule that fired. - No quota is charged. When the output block rejects the response, the gateway refunds the pre-consumed quota, so a blocked call is free to you even though the model produced tokens.
- The request is marked skip-retry — re-running the same prompt would just block again, so the gateway won’t burn a retry on another channel.
GET /api/guardrail/match, open
to any Member); the matched substring is captured only when the
guardrail’s Log raw content toggle is on (off by default). Full detail
lives in
the guardrail_blocked error
and the matches feed.
7. Where to go next
Output stage
The full output stage — screening the model’s reply, block vs. mask, and
grounding.
Streaming coverage
The complete matrix of what’s enforced on streaming vs. non-streaming
across every stage and action.
Actions
block, mask, and flag in depth — when each is the right call.
Input stage
The mirror image — masking is fully live here, including on streaming.
Related concepts
Related concepts
Threats this addresses
Threats this addresses
Full engine reference
Full engine reference
Guardrails — every rule type, field, and route,
including grounding and the LLM judge.
