Sanitize tool-call arguments

When an agent calls a tool, the arguments it passes are as risky as the prompt that produced them — an sk-… key dropped into a command field, a customer SSN pasted into a body, an internal token in a request header. The firewall sanitize verdict catches that material in the tool-call arguments, replaces it with a typed redaction token, and forwards the cleaned call — so the action still runs, but the secret never leaves the gateway.

“Sanitize tool output” means the call arguments, not the tool result. People search for sanitize tool output expecting the firewall to scrub what a tool returns. The sanitize verdict does not touch tool results — it redacts the arguments your agent sends into a tool call. If you need to screen the text a tool or model returns, that is a Guardrails output-stage job, not a firewall sanitize rule.

This is one verdict in the firewall’s matching language. For the full set see Verdicts and the rule reference; this page is the focused guide to authoring and reasoning about sanitize.

1. What sanitize does — and what it never touches

A rule with verdict: sanitize runs a redaction engine over the tool-call arguments before the call is dispatched. Each match is replaced with a canonical token and the call proceeds with the cleaned arguments — the tool still executes, just without the raw secret in it.

Redacts

The JSON arguments of a model-emitted tool_call or an MCP tools/call — command, body, headers, any string field where a secret or PII landed.

Never redacts

The content a tool returns, the prompt, the model’s response text. Sanitize is an argument-only redactor. Text screening is a Guardrail concern.

The redactor replaces each match with a typed token: a preset match becomes [redacted:<preset>] (e.g. [redacted:openai_key]), and a custom-pattern match becomes [redacted:custom]. The shape of the argument is preserved — only the sensitive substring changes — so a tool that expects valid JSON still receives valid JSON.

2. The built-in detector presets

A sanitize rule names one or more presets (well-known secret/PII shapes) and/or custom regex patterns. The built-in presets:

Preset	Catches
`aws_access_key`	AWS access key id (`AKIA…` / `ASIA…` + 16 chars)
`aws_secret_key`	A 40-char AWS secret (boundary-aware)
`openai_key`	`sk-` + ≥32 chars
`anthropic_key`	`sk-ant-` + ≥40 chars
`bearer_token`	`Bearer` + a ≥16-char token (keyword kept)
`email`	An email address
`ssn_us`	A US SSN in `3-2-4` form
`credit_card`	A 13–19 digit run that passes a Luhn check

A sanitize rule must declare at least one preset or custom pattern — an empty sanitizer is rejected when you save the rule. A credit_card match is additionally Luhn-checked, so a same-length number that isn’t a valid card is left untouched rather than over-redacted.

3. One concrete example

Author this in the console rule editor. The example redacts an OpenAI key and any email from the arguments of any http.* tool call your agent emits, then forwards the cleaned call:

{
  "label": "strip secrets from http tool args",
  "stage": "response",
  "tool_name_glob": "http.*",
  "verdict": "sanitize",
  "sanitize_json": "{\"presets\":[\"openai_key\",\"email\"],\"custom\":[]}"
}

If the model emits a call like:

{ "name": "http.post", "arguments": { "url": "…", "body": "key=sk-AAAA…BBBB user=jo@acme.com" } }

the gateway forwards it with the body rewritten to key=[redacted:openai_key] user=[redacted:email] — the request still runs, the secret and the address never leave the gateway.

Pin the rule to the response stage (model-emitted tool_calls) or leave the stage empty to also cover the mcp surface. Those are the surfaces that carry call-time arguments, which is what sanitize redacts.

4. On the inbound surface, sanitize escalates to deny

The inbound surface evaluates the tools an agent advertises on a request — tool definitions, which carry no call-time arguments yet. There is nothing to redact there, so a sanitize verdict on the inbound surface escalates to a deny (fail-closed): the request is blocked with firewall_blocked rather than forwarded unredacted.

Don’t author a sanitize rule expecting it to clean an inbound tool advertisement — it will block it. If you want a tool definition gone from the request, use an explicit deny. Reserve sanitize for the response and mcp surfaces, where real arguments exist.

5. Sanitize vs. the other ways to handle a secret

Three layers can act on a secret an agent is about to leak — pick by what and where:

Sanitize (firewall) — redact the tool-call arguments

Strips the secret out of a tool call’s arguments and still runs the call. Use it when the action is legitimate but the agent put something sensitive in a field. Argument-layer only.

Deny (firewall) — block the whole call

Stops the call entirely. Use it when the action itself is dangerous, not just an argument. This is also what sanitize becomes on the inbound surface. See block tools.

Guardrails — screen prompt / response text

The Secrets Blocker and PII guardrails screen the text of a request or response (including, on the output stage, model-generated content). That is the layer for “what a tool or model returns” — the thing sanitize does not do.

Test before you enforce. Sanitize rewrites a live call’s arguments on the response and mcp surfaces. Author your sanitize rules under shadow mode first and watch the events feed to confirm they match what you expect before any argument is actually rewritten.

6. Where sanitize shows up in your trail

Like every verdict, a sanitize evaluation is recorded as a firewall event — filterable by verdict, surface, tool, and run in the events log and rolled up in analytics. In shadow mode a sanitize verdict (like every enforcing verdict) is downgraded to audit and the reason is prefixed [shadow] would …, so you can measure impact before any argument is actually rewritten.

Where to go next

All verdicts

allow, audit, deny, sanitize, pending_approval, cap_cost.

Validate arguments

Match a call by what’s in its arguments — the JSONPath clause grammar.

Block tools

When the action itself is the problem, deny the whole call.

Firewall + Guardrails

Screen the text a tool or model returns — the layer sanitize doesn’t cover.

For the threats sanitize helps contain, see data exfiltration and dangerous tool calls. For the full rule grammar behind the verdict, see the firewall rule reference.

​1. What sanitize does — and what it never touches

Redacts

Never redacts

​2. The built-in detector presets

​3. One concrete example

​4. On the inbound surface, sanitize escalates to deny

​5. Sanitize vs. the other ways to handle a secret

​6. Where sanitize shows up in your trail

​Where to go next

All verdicts

Validate arguments

Block tools

Firewall + Guardrails

1. What sanitize does — and what it never touches

2. The built-in detector presets

3. One concrete example

4. On the inbound surface, sanitize escalates to deny

5. Sanitize vs. the other ways to handle a secret

6. Where sanitize shows up in your trail

Where to go next