shell.exec with rm -rf /, a
payment API with an outsized transfer amount, a database tool targeting the
production replica. This is agent tool abuse, and it is one of the
most consequential risks in agentic systems because tool calls have
real-world side effects that are often irreversible.
The Agent Firewall has three layered defenses. You can deploy them
independently or in combination.
1. Allow-listing: deny everything you didn’t explicitly permit
The strongest control is an allow-list. Instead of trying to enumerate every dangerous tool, you enumerate the tools this agent legitimately needs — and deny everything else. This is the zero-trust baseline. A policy withdefault_verdict: deny and explicit allow rules for each
approved tool achieves this. Example: an agent that should only read from a
CRM:
shell.exec, db.delete, payment.transfer — whether
issued intentionally or triggered by an injected instruction — hits the
* catch-all and returns an HTTP 400 firewall_blocked error. The agent
sees a structured tool error and cannot retry (the block is marked
skip-retry), so it cannot loop around the denial.
Set your policy’s
default_verdict to deny for full allow-list
enforcement. With the default audit verdict, unmatched calls are allowed
and logged but not blocked — useful during rollout but not a security
control by itself.| Pattern | What it covers |
|---|---|
crm.* | All tools in the crm namespace |
*.read | Any read-verb tool across all servers |
db.query | Exactly this one tool |
* | Everything (use for your catch-all deny) |
allow rules at low priority numbers and the catch-all
deny at a high one.
2. Argument validation: allow the tool, block the dangerous invocation
An allow-list on tool names is coarse — it blocksshell.exec entirely.
Sometimes you want to permit a tool but constrain how it can be called.
Argument clauses let you match on specific fields inside the tool-call
arguments, using JSONPath and a set of operators.
Example: allow shell.exec but block rm -rf
shell.exec is called and the $.command
argument matches the destructive-command regex. A normal shell.exec call
with a safe command falls through to the next rule (or the default verdict).
Put this rule at a lower priority number than any general allow shell.exec
rule so it fires first.
The full set of argument operators:
| Operator | Use it when |
|---|---|
eq | Exact match on a scalar value (string or number) |
contains | Substring match — e.g. $.query contains DROP TABLE |
regex | RE2 pattern match — safe on the hot path, no backtracking |
in | Value must be in a given array — e.g. allow only specific environments |
cidr_match | IP address in a CIDR block — useful for egress destination checks |
gt / lt | Numeric comparison — e.g. $.amount gt 10000 for payment caps |
args_match_json field is a JSON-encoded string — its value is the
clause block serialized to JSON, not a nested object. All clauses inside it
are AND-ed. If a path doesn’t exist
in the call’s arguments, the clause evaluates false and the rule does not
fire — the call falls through to the next rule or the default.
Payment guard example — deny any payment tool call with an amount
exceeding a threshold:
3. Human-in-the-loop: hold high-stakes calls for approval
For tools that are genuinely necessary but high-stakes — triggering a deployment, approving a refund, sending a bulk email — you can require a human sign-off before the call proceeds. Thepending_approval verdict
holds the call and returns a firewall_approval_pending response to the
agent:
pending_approval is compatible with argument clauses — you can hold only
the invocations that match a threshold, and allow routine ones through:
4. What a blocked call looks like
A denied call on the inbound surface (tool advertised in the request) returns HTTP 400 with error codefirewall_blocked. The response
includes structured metadata — the matched rule label, reason code, and
risk factors — and is marked skip-retry so a loop can’t hammer the same
denied call.
A call blocked on the response surface (the model already emitted
tool_calls) returns a tool error visible to the model, giving it a
chance to react — pick a different tool, ask the user, or stop — instead
of crashing.
5. First-match-wins ordering
Priority ordering matters. The engine walks rules in ascending priority order and stops at the first match. A common pattern:| Priority | Rule | Verdict |
|---|---|---|
| 5 | shell.exec + destructive regex | deny |
| 10 | shell.* (general) | allow |
| 20 | crm.* | allow |
| 9999 | * (catch-all) | deny |
shell.exec with a destructive
command is denied even though there is a general allow for shell.*.
Without the low-priority deny, the allow shell.* rule would win first.
6. Rolling out safely with shadow mode
Before switching a new policy to enforcing, turn on shadow mode. The policy evaluates every tool call and logs the verdict exactly as it would in production, but every enforcing verdict (deny, pending_approval,
sanitize) is downgraded to audit — nothing is blocked. The reason in
the event log is prefixed [shadow] would deny so you can measure impact
in the Events and Runs views.
Once you’ve confirmed the policy fires on what you expect — and nothing
you didn’t intend — turn shadow mode off. The next call is enforced.
The
tight autonomy level applies the block_destructive_shell preset
automatically. If you need a quick posture without writing rules, apply
tight from the console and it ships a deny policy for destructive shell
calls in one step. You can then layer your own allow-list rules on top.
See Autonomy levels.7. Related threats
Agent tool abuse rarely arrives in isolation. An unauthorized tool call is often the consequence of another attack vector:- Prompt injection — an attacker embeds instructions in retrieved content that steer the agent toward tools it wasn’t meant to call.
- Excessive agency — the agent was granted more tool access than its task requires, making any injection or misconfiguration immediately dangerous.
- The threat model — how tool abuse fits in the full attack surface for agentic systems.
Firewall rules reference
The complete matching language — tool globs, argument clauses, all
operators, verdicts, and the API.
Firewall overview
Policies, surfaces, autonomy levels, HITL approval, and observability.
