Skip to main content
You attach a firewall policy to a key, the model streams a tool call back, and the response stage strips or rewrites it before your agent acts on it. The enforcement decision is identical on every provider — same rules, same verdicts, same events. What differs is the wire shape your client sees once the firewall has acted on a streamed tool call, because OpenAI chat, the OpenAI Responses API, and native Claude /v1/messages each frame tool calls differently. This page is the focused note on those customer-observable differences. It does not re-document the rule language — see Firewall rules — or the stage model, covered in Stages. For the internal hold-and-reassemble mechanism shared by all three, read Streaming internals.

1. Why firewall provider streaming differs by wire

On a non-streamed response the firewall sees the whole reply at once and decides. On a stream, the model’s tool call arrives as a sequence of fragments — a name in one frame, argument JSON dribbled across many more. A verdict needs the complete call (name and full arguments), and a tool-call fragment, once forwarded, can’t be retracted. So on every provider the gateway does the same thing: it lets ordinary content stream through live, and holds the tool-call frames until the call is fully assembled. At end-of-stream it evaluates each assembled call and emits only the survivors — in that provider’s own event shape.
Your text never stalls. Only tool-call frames are held. Assistant content, reasoning, and role frames stream live and unchanged. The hold applies from the first tool-call fragment to the end of that turn — so a chat-only response streams exactly as if no firewall were attached.

2. OpenAI chat completions

On /v1/chat/completions, tool calls stream as delta.tool_calls fragments keyed by index. The gate holds those (and the legacy delta.function_call shape) and, at the closing frame, emits the surviving calls re-indexed from zero, followed by a finish frame:
OutcomeWhat your client receives
allowThe original held frames, byte-for-byte — true passthrough.
sanitizeOne tool_calls delta with rewritten arguments, then finish_reason: "tool_calls".
deny (some calls)Only the surviving calls, then finish_reason: "tool_calls".
deny (all calls)No tool call, then finish_reason: "stop" — the turn looks like the model chose to answer in text.
That last row is the tell to test against: when a firewall strips every tool call from an OpenAI chat turn, your agent sees a clean finish_reason: "stop", not an error frame. Build your loop to treat “no tool call this turn” as a valid outcome.

3. OpenAI Responses API

The native /v1/responses stream has its own event model — a tool call is a function_call item that opens with response.output_item.added, streams response.function_call_arguments.delta fragments, and completes at response.output_item.done. The firewall evaluates at done, the first point the call is whole:
The item’s added / argument-delta / done events are emitted unchanged once the call clears.
The added shell streams, then a done whose arguments are the redacted version — the original argument-delta fragments are dropped so the unredacted value never reaches you.
The buffered events are dropped, and the denied item is also filtered out of the terminal response.completed object your client builds its final state from — no dangling reference to a call that never ran.
Text and reasoning deltas stream live throughout, exactly as on chat completions.

4. Native Claude /v1/messages

A native Anthropic stream is a different beast: content arrives as indexed blockscontent_block_startcontent_block_delta (input_json_delta fragments) → content_block_stop — closed by a message_delta carrying stop_reason. The firewall holds from the first tool_use block, evaluates each, and reconstructs the surviving blocks with contiguous indices so a stripped block leaves no index gap. The Claude-specific tell is stop_reason. If every tool_use block is denied, a stop_reason of tool_use would promise your client a tool call that never arrives — so the gateway rewrites it to end_turn:
upstream:  content_block_start (tool_use) … message_delta {stop_reason: "tool_use"}
            ↓ firewall denies the only tool_use
client:    (no tool_use block)            … message_delta {stop_reason: "end_turn"}
A partial strip keeps the surviving tool_use blocks, re-indexed contiguously, and leaves stop_reason: "tool_use" intact.
This applies to native Claude streams. A Claude model called through OpenAI-format endpoints is enforced on the OpenAI chat wire instead (§2), so it shows finish_reason: "stop", not stop_reason: "end_turn". Match your end-of-turn handling to the wire format you called, not the underlying model.

5. One concrete example

The same rule produces the same decision on every provider — only the wire shape your client reads differs. Author it once, on the response stage:
{
  "stage": "response",
  "tool_name_glob": "shell.exec",
  "verdict": "deny",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.command\",\"op\":\"regex\",\"value\":\"rm -rf|mkfs\"}]}"
}
Stream the same prompt three ways and the firewall denies the rm -rf call every time. What your client observes:
WireTerminal signal after a full strip
OpenAI chatfinish_reason: "stop"
OpenAI Responsesitem absent from response.completed
Native Claudestop_reason: "end_turn"
The matched-and-denied call shows up identically in firewall events regardless of wire, so your observability is provider-agnostic even though the stream isn’t.

6. What stays constant across providers

The wire differs; the contract doesn’t:
  • Verdicts and rules are wire-agnostic. allow / audit / deny / sanitize mean the same thing on every provider. See Verdicts.
  • Sanitize touches tool-call arguments only, never the content a tool returns — on every wire. See Sanitize responses.
  • Allow is true passthrough. When the firewall takes no action, the held frames are replayed as the exact upstream bytes — no re-batching, no lost provider-specific fields.
  • Shadow mode applies everywhere. Turn it on and the held tool calls always survive (downgraded to audit) so you can measure a policy’s impact across providers before it changes traffic. See Shadow mode.

7. Where this fits

Streaming internals

The hold-assemble-reassemble mechanism every provider shares.

Stages

Why streamed tool-call enforcement lives on the response surface.

Verdicts

The provider-agnostic decisions a streamed call resolves to.

Response filtering

Gating the tool calls a model emits, stream or not.
For the threats these streamed checks address, see Dangerous tool calls and Data exfiltration; for where stream enforcement sits on the request path, see Enforcement path latency.