tool_calls:
concrete invocations with real, model-chosen arguments. The response
surface of the agent firewall inspects
exactly those, the moment they leave the model and before they reach your
agent loop. It’s the surface where you filter what the model actually
decided to do, with the arguments it actually picked.
This page covers the use-case for filtering on the response surface — when
to reach for it over inbound, the one
verdict twist it adds, and how it behaves on a stream. For the full rule
vocabulary and verdict semantics, see
Rule schema and
Verdicts.
1. Filter llm tool response calls, with arguments in scope
Theinbound stage sees the tools you
advertise — names only, no call-time arguments yet. The response
stage sees the tool_calls the model emits, which carry the arguments
the model chose. That’s the whole reason to filter here: it’s the only
surface that sees the actual call + arguments for a client-executed
(non-MCP) tool, so argument clauses,
sequences, and run-state rules all land on response.
The distinction in one line:
| Stage | Sees | Use it to |
|---|---|---|
inbound | Advertised tool definitions | Make a tool invisible to the model |
response | Emitted tool_calls + arguments | Filter the call the model actually made |
inbound answers which tools exist; response answers what the
model did with one. Reach for response (or leave stage empty to cover
both) when a tool legitimately appears in some requests but a particular
call of it is dangerous — or when you only control the agent loop, not
the advertised tool set.
A rule with no
stage runs on every surface, including response. Pin
to response only when a rule should only inspect emitted calls — for
example an argument clause that has nothing to match on the inbound surface
anyway. See Stages.2. One concrete example
Allowshell.exec in general, but strip it from the model’s reply the
moment its command argument looks destructive. In your workspace console,
open a policy (or create one) and add a
rule pinned to the response surface:
args_match_json — a JSON string holding
{"clauses":[…]}, each clause a path / op / value triple (op is one
of eq, contains, regex, gt, lt). The console form builds it for you;
the raw shape is shown here so the field name is unambiguous.
The tool stays advertised — the model can still propose shell.exec — but
when the emitted call carries a destructive command, the firewall removes
that tool_call from the reply before your agent ever sees it. A benign
shell.exec (say, ls -la) sails through untouched. This is the
“allow the tool, gate the call” pattern that the response surface exists
for; the argument clause is what makes it possible.
3. What a verdict does on the response surface
The response surface inspects each emittedtool_call and rewrites the
reply in place. Kept calls are forwarded byte-for-byte; only a matched call
changes:
deny → the call is stripped from the reply
deny → the call is stripped from the reply
The matched
tool_call is removed from the model’s response before it
reaches your agent. Unlike an inbound deny — which returns HTTP
400 with code firewall_blocked — a response-surface deny doesn’t
fail the request; the rest of the reply (other tool calls, any text)
flows through with the offending call simply absent.sanitize → arguments are redacted, the call forwarded
sanitize → arguments are redacted, the call forwarded
The matched substrings in the call’s arguments are replaced with the
engine’s redacted arguments and the cleaned call is forwarded — useful
when the tool is fine but the model put a secret or PII value in an
argument. Sanitize redacts tool-call arguments only; it never
touches the content a tool returns. If the engine has nothing to
substitute, the call is stripped (fail-closed). See
Sanitize responses.
pending_approval → the call is stripped here, not held
pending_approval → the call is stripped here, not held
Human-in-the-loop holds open on the inbound surface, not response.
A
pending_approval rule that first matches on an emitted call is
therefore stripped (fail-closed) — keeping it would forward an
un-reviewed call with no human decision. Author HITL holds to fire
inbound; see Approvals.allow / audit → the call passes, logged
allow / audit → the call passes, logged
allow and audit both forward the call; audit is the usual
default_verdict — record everything, block nothing, until you’re
ready to enforce.4. Streaming: held, then filtered
On a non-stream reply the whole response is parsed and filtered at once. On a stream, a model’stool_calls arrive as per-index deltas across many
SSE frames — and once a delta is forwarded, your agent already has it and it
can’t be retracted. So the gateway holds tool-call frames: they are
never forwarded mid-stream. At end of stream the firewall assembles each
call (name + complete arguments), evaluates it, and emits only the
surviving calls.
Content frames still stream normally — text tokens reach your client as they
arrive. Only
tool_call frames are held back for evaluation, so a denied or
sanitized call is filtered out before the assembled tool call is delivered.
The verdict and rule semantics are identical to the non-stream surface.
For the frame-level details, see
Streaming internals and
Provider streaming.5. Roll it out safely
Dry-run the rule first
Dry-run the rule first
The console Test tab runs a policy against a sample tool call and
returns the verdict, the matched rule, and the reason — nothing is
dispatched, nothing is persisted. Confirm your glob and argument clause
match the call you meant before attaching a key. See
Test rules.
Shadow mode for a live measurement
Shadow mode for a live measurement
Turn on shadow mode and every
enforcing verdict — including a response-surface deny — is downgraded to
audit, reason prefixed [shadow] would …. You measure exactly what
the rule would strip against real traffic before it changes a single
reply.Confirm filtering in the events log
Confirm filtering in the events log
Every evaluation writes a firewall event with the verdict, surface,
tool, and matched rule. Filter the
events log by surface
response to see
your rule firing on the emitted calls you expected. See
Analytics.6. Attach the policy and who can configure it
A policy does nothing until a key resolves to it. Attach in the console by settingfirewall_policy_id on the
key, or make the policy the workspace
default. Resolution: the key’s attached policy (when it exists and is
enabled), else the workspace default — and a disabled attached policy
falls back to the workspace default. See
Manage policies.
All configuration runs in the console under your session
(/api/workspace/firewall/*):
| Action | Role |
|---|---|
| Read policies, presets, discovered tools, Simulate | Member |
| Dry-run (Test), read the events log and run aggregates | Developer+ |
| Create / edit / delete rules and policies | Developer+ |
Related
Validate arguments
The argument clauses that make response-surface filtering precise.
Sanitize responses
Redact secrets from a call’s arguments instead of stripping it.
Firewall stages
How
response compares to inbound, mcp, and egress.Block tools
The inbound counterpart: deny a tool before the model is offered it.
Dangerous tool calls
The threat response filtering addresses.
Firewall reference
The full rule + matching reference.
