response stage strips or rewrites it
before your agent acts on it. The enforcement decision is identical on
every provider — same rules, same verdicts, same events. What differs is
the wire shape your client sees once the firewall has acted on a
streamed tool call, because OpenAI chat, the OpenAI Responses API, and
native Claude /v1/messages each frame tool calls differently.
This page is the focused note on those customer-observable differences. It
does not re-document the rule language — see
Firewall rules — or the stage model, covered in
Stages. For the internal hold-and-reassemble
mechanism shared by all three, read
Streaming internals.
1. Why firewall provider streaming differs by wire
On a non-streamed response the firewall sees the whole reply at once and decides. On a stream, the model’s tool call arrives as a sequence of fragments — a name in one frame, argument JSON dribbled across many more. A verdict needs the complete call (name and full arguments), and a tool-call fragment, once forwarded, can’t be retracted. So on every provider the gateway does the same thing: it lets ordinary content stream through live, and holds the tool-call frames until the call is fully assembled. At end-of-stream it evaluates each assembled call and emits only the survivors — in that provider’s own event shape.Your text never stalls. Only tool-call frames are held. Assistant
content, reasoning, and role frames stream live and unchanged. The hold
applies from the first tool-call fragment to the end of that turn — so a
chat-only response streams exactly as if no firewall were attached.
2. OpenAI chat completions
On/v1/chat/completions, tool calls stream as delta.tool_calls
fragments keyed by index. The gate holds those (and the legacy
delta.function_call shape) and, at the closing frame, emits the surviving
calls re-indexed from zero, followed by a finish frame:
| Outcome | What your client receives |
|---|---|
| allow | The original held frames, byte-for-byte — true passthrough. |
| sanitize | One tool_calls delta with rewritten arguments, then finish_reason: "tool_calls". |
| deny (some calls) | Only the surviving calls, then finish_reason: "tool_calls". |
| deny (all calls) | No tool call, then finish_reason: "stop" — the turn looks like the model chose to answer in text. |
3. OpenAI Responses API
The native/v1/responses stream has its own event model — a tool call is
a function_call item that opens with response.output_item.added,
streams response.function_call_arguments.delta fragments, and completes at
response.output_item.done. The firewall evaluates at done, the first
point the call is whole:
allow → buffered events flushed verbatim
allow → buffered events flushed verbatim
The item’s
added / argument-delta / done events are emitted
unchanged once the call clears.sanitize → item shell + rewritten done
sanitize → item shell + rewritten done
The
added shell streams, then a done whose arguments are the
redacted version — the original argument-delta fragments are dropped so
the unredacted value never reaches you.deny → item removed everywhere
deny → item removed everywhere
The buffered events are dropped, and the denied item is also filtered
out of the terminal
response.completed object your client builds its
final state from — no dangling reference to a call that never ran.4. Native Claude /v1/messages
A native Anthropic stream is a different beast: content arrives as
indexed blocks — content_block_start → content_block_delta
(input_json_delta fragments) → content_block_stop — closed by a
message_delta carrying stop_reason. The firewall holds from the first
tool_use block, evaluates each, and reconstructs the surviving blocks
with contiguous indices so a stripped block leaves no index gap.
The Claude-specific tell is stop_reason. If every tool_use block is
denied, a stop_reason of tool_use would promise your client a tool call
that never arrives — so the gateway rewrites it to end_turn:
tool_use blocks, re-indexed
contiguously, and leaves stop_reason: "tool_use" intact.
5. One concrete example
The same rule produces the same decision on every provider — only the wire shape your client reads differs. Author it once, on theresponse
stage:
rm -rf
call every time. What your client observes:
| Wire | Terminal signal after a full strip |
|---|---|
| OpenAI chat | finish_reason: "stop" |
| OpenAI Responses | item absent from response.completed |
| Native Claude | stop_reason: "end_turn" |
6. What stays constant across providers
The wire differs; the contract doesn’t:- Verdicts and rules are wire-agnostic.
allow/audit/deny/sanitizemean the same thing on every provider. See Verdicts. - Sanitize touches tool-call arguments only, never the content a tool returns — on every wire. See Sanitize responses.
- Allow is true passthrough. When the firewall takes no action, the held frames are replayed as the exact upstream bytes — no re-batching, no lost provider-specific fields.
- Shadow mode applies everywhere. Turn it on and the held tool calls
always survive (downgraded to
audit) so you can measure a policy’s impact across providers before it changes traffic. See Shadow mode.
7. Where this fits
Streaming internals
The hold-assemble-reassemble mechanism every provider shares.
Stages
Why streamed tool-call enforcement lives on the
response surface.Verdicts
The provider-agnostic decisions a streamed call resolves to.
Response filtering
Gating the tool calls a model emits, stream or not.
