Skip to main content
An agent that has been prompt-injected, misconfigured, or simply given too much latitude can call tools it was never meant to touch — or call a legitimate tool with dangerous arguments: shell.exec with rm -rf /, a payment API with an outsized transfer amount, a database tool targeting the production replica. This is agent tool abuse, and it is one of the most consequential risks in agentic systems because tool calls have real-world side effects that are often irreversible. The Agent Firewall has three layered defenses. You can deploy them independently or in combination.

1. Allow-listing: deny everything you didn’t explicitly permit

The strongest control is an allow-list. Instead of trying to enumerate every dangerous tool, you enumerate the tools this agent legitimately needs — and deny everything else. This is the zero-trust baseline. A policy with default_verdict: deny and explicit allow rules for each approved tool achieves this. Example: an agent that should only read from a CRM:
[
  {
    "priority": 10,
    "label": "allow crm reads",
    "tool_name_glob": "crm.get*",
    "verdict": "allow"
  },
  {
    "priority": 20,
    "label": "allow crm search",
    "tool_name_glob": "crm.search",
    "verdict": "allow"
  },
  {
    "priority": 9999,
    "label": "deny everything else",
    "tool_name_glob": "*",
    "verdict": "deny"
  }
]
Any call to shell.exec, db.delete, payment.transfer — whether issued intentionally or triggered by an injected instruction — hits the * catch-all and returns an HTTP 400 firewall_blocked error. The agent sees a structured tool error and cannot retry (the block is marked skip-retry), so it cannot loop around the denial.
Set your policy’s default_verdict to deny for full allow-list enforcement. With the default audit verdict, unmatched calls are allowed and logged but not blocked — useful during rollout but not a security control by itself.
Glob patterns let you allow entire tool families with one rule. The common patterns:
PatternWhat it covers
crm.*All tools in the crm namespace
*.readAny read-verb tool across all servers
db.queryExactly this one tool
*Everything (use for your catch-all deny)
Tool matching is first-match-wins in ascending priority order. Put your specific allow rules at low priority numbers and the catch-all deny at a high one.

2. Argument validation: allow the tool, block the dangerous invocation

An allow-list on tool names is coarse — it blocks shell.exec entirely. Sometimes you want to permit a tool but constrain how it can be called. Argument clauses let you match on specific fields inside the tool-call arguments, using JSONPath and a set of operators. Example: allow shell.exec but block rm -rf
{
  "priority": 10,
  "label": "block destructive rm",
  "tool_name_glob": "shell.exec",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.command\",\"op\":\"regex\",\"value\":\"rm\\\\s+-[^\\\\s]*r[^\\\\s]*f|mkfs|dd\\\\s+if=|:\\\\(\\\\)\\\\{.*\\\\}\"}]}",
  "verdict": "deny"
}
This rule fires only when shell.exec is called and the $.command argument matches the destructive-command regex. A normal shell.exec call with a safe command falls through to the next rule (or the default verdict). Put this rule at a lower priority number than any general allow shell.exec rule so it fires first. The full set of argument operators:
OperatorUse it when
eqExact match on a scalar value (string or number)
containsSubstring match — e.g. $.query contains DROP TABLE
regexRE2 pattern match — safe on the hot path, no backtracking
inValue must be in a given array — e.g. allow only specific environments
cidr_matchIP address in a CIDR block — useful for egress destination checks
gt / ltNumeric comparison — e.g. $.amount gt 10000 for payment caps
The args_match_json field is a JSON-encoded string — its value is the clause block serialized to JSON, not a nested object. All clauses inside it are AND-ed. If a path doesn’t exist in the call’s arguments, the clause evaluates false and the rule does not fire — the call falls through to the next rule or the default. Payment guard example — deny any payment tool call with an amount exceeding a threshold:
{
  "priority": 5,
  "label": "cap payment amount",
  "tool_name_glob": "payment.*",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.amount_cents\",\"op\":\"gt\",\"value\":100000}]}",
  "verdict": "deny"
}

3. Human-in-the-loop: hold high-stakes calls for approval

For tools that are genuinely necessary but high-stakes — triggering a deployment, approving a refund, sending a bulk email — you can require a human sign-off before the call proceeds. The pending_approval verdict holds the call and returns a firewall_approval_pending response to the agent:
{
  "priority": 20,
  "label": "hold deployment calls for review",
  "tool_name_glob": "deploy.*",
  "verdict": "pending_approval"
}
The agent (or your framework) polls the approval id. A reviewer approves or rejects from the console or via a webhook callback. If approved, the agent re-submits the original call with the single-use approval token and the gateway lets it through once. pending_approval is compatible with argument clauses — you can hold only the invocations that match a threshold, and allow routine ones through:
[
  {
    "priority": 10,
    "label": "hold large deploys",
    "tool_name_glob": "deploy.release",
    "args_match_json": "{\"clauses\":[{\"path\":\"$.environment\",\"op\":\"eq\",\"value\":\"production\"}]}",
    "verdict": "pending_approval"
  },
  {
    "priority": 20,
    "label": "allow staging deploys",
    "tool_name_glob": "deploy.*",
    "verdict": "allow"
  }
]

4. What a blocked call looks like

A denied call on the inbound surface (tool advertised in the request) returns HTTP 400 with error code firewall_blocked. The response includes structured metadata — the matched rule label, reason code, and risk factors — and is marked skip-retry so a loop can’t hammer the same denied call. A call blocked on the response surface (the model already emitted tool_calls) returns a tool error visible to the model, giving it a chance to react — pick a different tool, ask the user, or stop — instead of crashing.

5. First-match-wins ordering

Priority ordering matters. The engine walks rules in ascending priority order and stops at the first match. A common pattern:
PriorityRuleVerdict
5shell.exec + destructive regexdeny
10shell.* (general)allow
20crm.*allow
9999* (catch-all)deny
Priority 5 fires before priority 10 — so shell.exec with a destructive command is denied even though there is a general allow for shell.*. Without the low-priority deny, the allow shell.* rule would win first.

6. Rolling out safely with shadow mode

Before switching a new policy to enforcing, turn on shadow mode. The policy evaluates every tool call and logs the verdict exactly as it would in production, but every enforcing verdict (deny, pending_approval, sanitize) is downgraded to audit — nothing is blocked. The reason in the event log is prefixed [shadow] would deny so you can measure impact in the Events and Runs views. Once you’ve confirmed the policy fires on what you expect — and nothing you didn’t intend — turn shadow mode off. The next call is enforced.
The tight autonomy level applies the block_destructive_shell preset automatically. If you need a quick posture without writing rules, apply tight from the console and it ships a deny policy for destructive shell calls in one step. You can then layer your own allow-list rules on top. See Autonomy levels.
Agent tool abuse rarely arrives in isolation. An unauthorized tool call is often the consequence of another attack vector:
  • Prompt injection — an attacker embeds instructions in retrieved content that steer the agent toward tools it wasn’t meant to call.
  • Excessive agency — the agent was granted more tool access than its task requires, making any injection or misconfiguration immediately dangerous.
  • The threat model — how tool abuse fits in the full attack surface for agentic systems.
The Agent Firewall is the enforcement layer; the principle of least privilege (narrow tool allow-lists, scoped keys) is the design posture that makes it effective.

Firewall rules reference

The complete matching language — tool globs, argument clauses, all operators, verdicts, and the API.

Firewall overview

Policies, surfaces, autonomy levels, HITL approval, and observability.