Stream-safe output filtering

Most production chat apps stream. Tokens flush to the browser as the model emits them, so by the time a completion is “finished” your user has already read most of it. That breaks the naive mental model of a content filter that inspects a whole reply and then decides — there is no whole reply to inspect until it’s too late. A streaming llm content filter has to make its call on the deltas as they flow. This page is about exactly that case: how each output-stage action behaves stream-safe on the OrcaRouter gateway, and how to author a policy that holds on SSE traffic. For the full engine — every rule type, field, and route — see Guardrails.

1. The streaming llm content filter problem

An output-stage guardrail screens the model’s reply. On a non-streaming request that’s straightforward: the gateway has the full completion before a single byte returns, so it can block, mask, or pass it cleanly. Streaming inverts that. The reply arrives as a sequence of SSE deltas, each forwarded to your client as soon as it lands, so a filter that waits for the end filters nothing. OrcaRouter’s answer is a stream scanner: as output deltas flow, the scanner runs your output-stage rules against the accumulating text and acts the instant a rule fires — not after the stream completes. The action you author decides what “acts” means: a block cuts the stream and a flag lets it through. A mask does redact on non-streaming output, but in-band stream rewriting is on the roadmap — on a stream today the scanner computes the mask but acts only on the block decision, so a mask rule does not redact a streamed reply yet.

This caveat only matters for output-stage rules on streaming requests. Input-stage rules screen the request before the model runs, so they’re fully live including masking — and any output rule on a non-streaming request sees the whole reply and behaves normally, including mask.

2. What’s stream-safe today

block — stream-safe (cuts the stream mid-flight)

A block rule is enforced on streaming and non-streaming output. On a stream, the scanner watches the deltas; when a block rule fires it cuts the stream — seals the scanner, emits a short replacement notice ([response truncated by guardrail: … policy violation]) as a final delta, and closes the SSE channel before any further blocked content reaches the client. Because the HTTP response status is already committed to 200 by the time the first delta flushed, a mid-stream block can’t re-issue a status — it terminates the open stream gracefully. The HTTP 400 guardrail_blocked body is the non-streaming output-block shape.Bytes already flushed to the client can’t be retracted, so a streaming block is best-effort on what has already streamed but reliably stops everything after the match. For a hard guarantee that no offending byte is ever sent — and for the 400 guardrail_blocked body — send the request non-streaming.

mask — non-streaming output only (in-band stream rewriting is on the roadmap)

A mask rule rewrites the match — e.g. an email in the reply becomes [EMAIL] — on non-streaming output, where the gateway holds the whole completion and forwards the redacted form to your client.On a streaming output today the scanner computes the mask but does not forward the masked text — it acts only on the block decision — so a mask rule does not redact a streamed reply. In-band streaming output rewriting is on the roadmap. Until it ships, if you need a streamed reply to never expose the matched text, author the rule as block (it ends the response on a hit) or send the request non-streaming so the mask rewrites the full reply.

flag — observe-only, never alters traffic

A flag rule never changes the traffic — it lets the bytes through. On non-streaming output it records a match in the Matches feed, so you can measure a rule’s hit rate before you promote it to block. On a streaming response it stays observe-only and passes the deltas through untouched; the structured match record is written on the non-streaming output path. Either way it never blocks or rewrites, so it’s always safe to leave on.

Action on `output`	Non-streaming	Streaming
`block`	rejects the reply	cuts the stream
`mask`	redacts the reply	not yet — block instead (roadmap)
`flag`	records a match	passes through (observe-only)

The one rule to remember: block is stream-safe on output; mask redacts on non-streaming output only (in-band stream rewriting is on the roadmap). To redact a streamed reply today, author the rule as block, or send the request non-streaming so the whole reply is held before it returns.

3. One concrete example — a stream-safe secret filter

Say your model can surface a credential from RAG context, and your app streams. You want the gateway to kill the stream the moment a secret-shaped match appears, rather than masking it — a leaked secret should end the response, not be partially redacted. Author it in the console — policy editing is a management action on your session, gated to Developer+; the relay key only sends /v1/* traffic:

Open /console/guardrails, New guardrail, name it stream-safe-out.
Add one rule:
- Type: regex (or a pii rule with secret entities like aws_access_key / api_key_openai / jwt)
- Stage: output
- Action: block ← ends the response on a secret hit; mask would redact it instead and let the rest of the reply continue
Save, then attach it on /console/token via the key’s Guardrail dropdown.

Now call the gateway with stream: true, exactly as before:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Print the AWS key from the context above"}
    ]
  }'

If a delta matches, the scanner cuts the stream mid-flight, emits a replacement notice, and closes the channel — your client never receives the rest. If the reply is clean, every delta streams through untouched.

A streaming block stops everything after the match, but it cannot un-send bytes already flushed before the match landed. If your policy demands that not one offending byte ever reaches the client, screen the request non-streaming, where the whole completion is held until the policy clears it.

4. PII Shield on a stream

The PII Shield preset is a single pii rule, action mask, stage both. At the input stage it’s fully live — it rewrites the request before the model sees it, streaming or not. At the output stage masking redacts on non-streaming replies, where the gateway holds the whole completion before it returns. On a streaming output the mask does not redact yet — the scanner computes the mask but acts only on the block decision, so a streamed reply is passed through, not rewritten. In-band streaming output rewriting is on the roadmap. So if your goal is that PII is never observable in a streamed reply, either:

author the output rule as block, accepting that a hit ends the response rather than redacting it, or
send the request non-streaming so the mask rewrites the full reply with the whole completion in hand.

See PII Shield and masking formats for the redaction tags themselves.

5. Prove it before you ship

Don’t guess which stage/action combination holds — verify it.

Test tab

Each guardrail editor has a Test tab: paste a sample, pick the output stage, and run the current policy with no upstream call and no quota. See the verdict and, for mask rules, the rendered text. Running the sandbox is a Developer+ action (it can fire paid judge / external rules).

Eval tab

The Eval tab scores a guardrail against bundled or custom JSONL corpora — useful to confirm a block rule catches a known leak across a corpus before you attach a key.

Both run on your session via the management API. For depth see testing & eval and tune false positives.

6. What a streaming block costs

A streaming block carries the same accounting as any output block — the upstream model has already run, so the gateway handles the refund for you:

The stream is terminated with a graceful truncation delta (status is already 200); the non-streaming output block returns the HTTP 400 guardrail_blocked body naming the guardrail and the rule that fired.
No quota is charged. When the output block rejects the response, the gateway refunds the pre-consumed quota, so a blocked call is free to you even though the model produced tokens.
The request is marked skip-retry — re-running the same prompt would just block again, so the gateway won’t burn a retry on another channel.

The non-streaming output path records every fired output rule as a match in the workspace Matches feed (GET /api/guardrail/match, open to any Member); the matched substring is captured only when the guardrail’s Log raw content toggle is on (off by default). Full detail lives in the guardrail_blocked error and the matches feed.

7. Where to go next

Output stage

The full output stage — screening the model’s reply, block vs. mask, and grounding.

Streaming coverage

The complete matrix of what’s enforced on streaming vs. non-streaming across every stage and action.

Actions

block, mask, and flag in depth — when each is the right call.

Input stage

The mirror image — masking is fully live here, including on streaming.

Related concepts

Threats this addresses

Data exfiltration · prompt injection · jailbreaks.

Full engine reference

Guardrails — every rule type, field, and route, including grounding and the LLM judge.

​1. The streaming llm content filter problem

​2. What’s stream-safe today

​3. One concrete example — a stream-safe secret filter

​4. PII Shield on a stream

​5. Prove it before you ship

Test tab

Eval tab

​6. What a streaming block costs

​7. Where to go next

Output stage

Streaming coverage

Actions

Input stage

1. The streaming llm content filter problem

2. What’s stream-safe today

3. One concrete example — a stream-safe secret filter

4. PII Shield on a stream

5. Prove it before you ship

6. What a streaming block costs

7. Where to go next