Response-stage filtering

When a model replies, it doesn’t just return text — it emits tool_calls: concrete invocations with real, model-chosen arguments. The response surface of the agent firewall inspects exactly those, the moment they leave the model and before they reach your agent loop. It’s the surface where you filter what the model actually decided to do, with the arguments it actually picked. This page covers the use-case for filtering on the response surface — when to reach for it over inbound, the one verdict twist it adds, and how it behaves on a stream. For the full rule vocabulary and verdict semantics, see Rule schema and Verdicts.

1. Filter llm tool response calls, with arguments in scope

The inbound stage sees the tools you advertise — names only, no call-time arguments yet. The response stage sees the tool_calls the model emits, which carry the arguments the model chose. That’s the whole reason to filter here: it’s the only surface that sees the actual call + arguments for a client-executed (non-MCP) tool, so argument clauses, sequences, and run-state rules all land on response. The distinction in one line:

Stage	Sees	Use it to
`inbound`	Advertised tool definitions	Make a tool invisible to the model
`response`	Emitted `tool_calls` + arguments	Filter the call the model actually made

So inbound answers which tools exist; response answers what the model did with one. Reach for response (or leave stage empty to cover both) when a tool legitimately appears in some requests but a particular call of it is dangerous — or when you only control the agent loop, not the advertised tool set.

A rule with no stage runs on every surface, including response. Pin to response only when a rule should only inspect emitted calls — for example an argument clause that has nothing to match on the inbound surface anyway. See Stages.

2. One concrete example

Allow shell.exec in general, but strip it from the model’s reply the moment its command argument looks destructive. In your workspace console, open a policy (or create one) and add a rule pinned to the response surface:

{
  "label": "block destructive shell calls",
  "stage": "response",
  "tool_name_glob": "shell.exec",
  "verdict": "deny",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.command\",\"op\":\"regex\",\"value\":\"rm -rf|mkfs|dd if=\"}]}"
}

The argument matcher lives in args_match_json — a JSON string holding {"clauses":[…]}, each clause a path / op / value triple (op is one of eq, contains, regex, gt, lt). The console form builds it for you; the raw shape is shown here so the field name is unambiguous. The tool stays advertised — the model can still propose shell.exec — but when the emitted call carries a destructive command, the firewall removes that tool_call from the reply before your agent ever sees it. A benign shell.exec (say, ls -la) sails through untouched. This is the “allow the tool, gate the call” pattern that the response surface exists for; the argument clause is what makes it possible.

Rules evaluate in priority order, first match wins. Put a narrow allow exception at a lower priority number than a broad deny so the exception runs first. See Rule priority.

3. What a verdict does on the response surface

The response surface inspects each emitted tool_call and rewrites the reply in place. Kept calls are forwarded byte-for-byte; only a matched call changes:

deny → the call is stripped from the reply

The matched tool_call is removed from the model’s response before it reaches your agent. Unlike an inbound deny — which returns HTTP 400 with code firewall_blocked — a response-surface deny doesn’t fail the request; the rest of the reply (other tool calls, any text) flows through with the offending call simply absent.

sanitize → arguments are redacted, the call forwarded

The matched substrings in the call’s arguments are replaced with the engine’s redacted arguments and the cleaned call is forwarded — useful when the tool is fine but the model put a secret or PII value in an argument. Sanitize redacts tool-call arguments only; it never touches the content a tool returns. If the engine has nothing to substitute, the call is stripped (fail-closed). See Sanitize responses.

pending_approval → the call is stripped here, not held

Human-in-the-loop holds open on the inbound surface, not response. A pending_approval rule that first matches on an emitted call is therefore stripped (fail-closed) — keeping it would forward an un-reviewed call with no human decision. Author HITL holds to fire inbound; see Approvals.

allow / audit → the call passes, logged

allow and audit both forward the call; audit is the usual default_verdict — record everything, block nothing, until you’re ready to enforce.

cap_cost and pending_approval are inbound concepts on this surface. cap_cost is inert on response (the model already ran), and pending_approval resolves to a strip rather than a hold. Put cost caps and HITL holds on the inbound surface — see Cap cost and Approvals.

4. Streaming: held, then filtered

On a non-stream reply the whole response is parsed and filtered at once. On a stream, a model’s tool_calls arrive as per-index deltas across many SSE frames — and once a delta is forwarded, your agent already has it and it can’t be retracted. So the gateway holds tool-call frames: they are never forwarded mid-stream. At end of stream the firewall assembles each call (name + complete arguments), evaluates it, and emits only the surviving calls.

Content frames still stream normally — text tokens reach your client as they arrive. Only tool_call frames are held back for evaluation, so a denied or sanitized call is filtered out before the assembled tool call is delivered. The verdict and rule semantics are identical to the non-stream surface. For the frame-level details, see Streaming internals and Provider streaming.

5. Roll it out safely

Dry-run the rule first

The console Test tab runs a policy against a sample tool call and returns the verdict, the matched rule, and the reason — nothing is dispatched, nothing is persisted. Confirm your glob and argument clause match the call you meant before attaching a key. See Test rules.

Shadow mode for a live measurement

Turn on shadow mode and every enforcing verdict — including a response-surface deny — is downgraded to audit, reason prefixed [shadow] would …. You measure exactly what the rule would strip against real traffic before it changes a single reply.

Confirm filtering in the events log

Every evaluation writes a firewall event with the verdict, surface, tool, and matched rule. Filter the events log by surface response to see your rule firing on the emitted calls you expected. See Analytics.

6. Attach the policy and who can configure it

A policy does nothing until a key resolves to it. Attach in the console by setting firewall_policy_id on the key, or make the policy the workspace default. Resolution: the key’s attached policy (when it exists and is enabled), else the workspace default — and a disabled attached policy falls back to the workspace default. See Manage policies. All configuration runs in the console under your session (/api/workspace/firewall/*):

Action	Role
Read policies, presets, discovered tools, Simulate	Member
Dry-run (Test), read the events log and run aggregates	Developer+
Create / edit / delete rules and policies	Developer+

Validate arguments

The argument clauses that make response-surface filtering precise.

Sanitize responses

Redact secrets from a call’s arguments instead of stripping it.

Firewall stages

How response compares to inbound, mcp, and egress.

Block tools

The inbound counterpart: deny a tool before the model is offered it.

Dangerous tool calls

The threat response filtering addresses.

Firewall reference

The full rule + matching reference.

​1. Filter llm tool response calls, with arguments in scope

​2. One concrete example

​3. What a verdict does on the response surface

​4. Streaming: held, then filtered

​5. Roll it out safely

​6. Attach the policy and who can configure it

​Related

Validate arguments

Sanitize responses

Firewall stages

Block tools

Dangerous tool calls

Firewall reference

1. Filter llm tool response calls, with arguments in scope

2. One concrete example

3. What a verdict does on the response surface

4. Streaming: held, then filtered

5. Roll it out safely

6. Attach the policy and who can configure it

Related