Provider notes: OpenAI vs Claude native streams

You attach a firewall policy to a key, the model streams a tool call back, and the response stage strips or rewrites it before your agent acts on it. The enforcement decision is identical on every provider — same rules, same verdicts, same events. What differs is the wire shape your client sees once the firewall has acted on a streamed tool call, because OpenAI chat, the OpenAI Responses API, and native Claude /v1/messages each frame tool calls differently. This page is the focused note on those customer-observable differences. It does not re-document the rule language — see Firewall rules — or the stage model, covered in Stages. For the internal hold-and-reassemble mechanism shared by all three, read Streaming internals.

1. Why firewall provider streaming differs by wire

On a non-streamed response the firewall sees the whole reply at once and decides. On a stream, the model’s tool call arrives as a sequence of fragments — a name in one frame, argument JSON dribbled across many more. A verdict needs the complete call (name and full arguments), and a tool-call fragment, once forwarded, can’t be retracted. So on every provider the gateway does the same thing: it lets ordinary content stream through live, and holds the tool-call frames until the call is fully assembled. At end-of-stream it evaluates each assembled call and emits only the survivors — in that provider’s own event shape.

Your text never stalls. Only tool-call frames are held. Assistant content, reasoning, and role frames stream live and unchanged. The hold applies from the first tool-call fragment to the end of that turn — so a chat-only response streams exactly as if no firewall were attached.

2. OpenAI chat completions

On /v1/chat/completions, tool calls stream as delta.tool_calls fragments keyed by index. The gate holds those (and the legacy delta.function_call shape) and, at the closing frame, emits the surviving calls re-indexed from zero, followed by a finish frame:

Outcome	What your client receives
allow	The original held frames, byte-for-byte — true passthrough.
sanitize	One `tool_calls` delta with rewritten arguments, then `finish_reason: "tool_calls"`.
deny (some calls)	Only the surviving calls, then `finish_reason: "tool_calls"`.
deny (all calls)	No tool call, then `finish_reason: "stop"` — the turn looks like the model chose to answer in text.

That last row is the tell to test against: when a firewall strips every tool call from an OpenAI chat turn, your agent sees a clean finish_reason: "stop", not an error frame. Build your loop to treat “no tool call this turn” as a valid outcome.

3. OpenAI Responses API

The native /v1/responses stream has its own event model — a tool call is a function_call item that opens with response.output_item.added, streams response.function_call_arguments.delta fragments, and completes at response.output_item.done. The firewall evaluates at done, the first point the call is whole:

allow → buffered events flushed verbatim

The item’s added / argument-delta / done events are emitted unchanged once the call clears.

sanitize → item shell + rewritten done

The added shell streams, then a done whose arguments are the redacted version — the original argument-delta fragments are dropped so the unredacted value never reaches you.

deny → item removed everywhere

The buffered events are dropped, and the denied item is also filtered out of the terminal response.completed object your client builds its final state from — no dangling reference to a call that never ran.

Text and reasoning deltas stream live throughout, exactly as on chat completions.

4. Native Claude `/v1/messages`

A native Anthropic stream is a different beast: content arrives as indexed blocks — content_block_start → content_block_delta (input_json_delta fragments) → content_block_stop — closed by a message_delta carrying stop_reason. The firewall holds from the first tool_use block, evaluates each, and reconstructs the surviving blocks with contiguous indices so a stripped block leaves no index gap. The Claude-specific tell is stop_reason. If every tool_use block is denied, a stop_reason of tool_use would promise your client a tool call that never arrives — so the gateway rewrites it to end_turn:

upstream:  content_block_start (tool_use) … message_delta {stop_reason: "tool_use"}
            ↓ firewall denies the only tool_use
client:    (no tool_use block)            … message_delta {stop_reason: "end_turn"}

A partial strip keeps the surviving tool_use blocks, re-indexed contiguously, and leaves stop_reason: "tool_use" intact.

This applies to native Claude streams. A Claude model called through OpenAI-format endpoints is enforced on the OpenAI chat wire instead (§2), so it shows finish_reason: "stop", not stop_reason: "end_turn". Match your end-of-turn handling to the wire format you called, not the underlying model.

5. One concrete example

The same rule produces the same decision on every provider — only the wire shape your client reads differs. Author it once, on the response stage:

{
  "stage": "response",
  "tool_name_glob": "shell.exec",
  "verdict": "deny",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.command\",\"op\":\"regex\",\"value\":\"rm -rf|mkfs\"}]}"
}

Stream the same prompt three ways and the firewall denies the rm -rf call every time. What your client observes:

Wire	Terminal signal after a full strip
OpenAI chat	`finish_reason: "stop"`
OpenAI Responses	item absent from `response.completed`
Native Claude	`stop_reason: "end_turn"`

The matched-and-denied call shows up identically in firewall events regardless of wire, so your observability is provider-agnostic even though the stream isn’t.

6. What stays constant across providers

The wire differs; the contract doesn’t:

Verdicts and rules are wire-agnostic. allow / audit / deny / sanitize mean the same thing on every provider. See Verdicts.
Sanitize touches tool-call arguments only, never the content a tool returns — on every wire. See Sanitize responses.
Allow is true passthrough. When the firewall takes no action, the held frames are replayed as the exact upstream bytes — no re-batching, no lost provider-specific fields.
Shadow mode applies everywhere. Turn it on and the held tool calls always survive (downgraded to audit) so you can measure a policy’s impact across providers before it changes traffic. See Shadow mode.

7. Where this fits

Streaming internals

The hold-assemble-reassemble mechanism every provider shares.

Stages

Why streamed tool-call enforcement lives on the response surface.

Verdicts

The provider-agnostic decisions a streamed call resolves to.

Response filtering

Gating the tool calls a model emits, stream or not.

For the threats these streamed checks address, see Dangerous tool calls and Data exfiltration; for where stream enforcement sits on the request path, see Enforcement path latency.

​1. Why firewall provider streaming differs by wire

​2. OpenAI chat completions

​3. OpenAI Responses API

​4. Native Claude /v1/messages

​5. One concrete example

​6. What stays constant across providers

​7. Where this fits

Streaming internals

Stages

Verdicts

Response filtering

1. Why firewall provider streaming differs by wire

2. OpenAI chat completions

3. OpenAI Responses API

4. Native Claude `/v1/messages`

5. One concrete example

6. What stays constant across providers

7. Where this fits