Dangerous & unauthorized tool calls

An agent that has been prompt-injected, misconfigured, or simply given too much latitude can call tools it was never meant to touch — or call a legitimate tool with dangerous arguments: shell.exec with rm -rf /, a payment API with an outsized transfer amount, a database tool targeting the production replica. This is agent tool abuse, and it is one of the most consequential risks in agentic systems because tool calls have real-world side effects that are often irreversible. The Agent Firewall has three layered defenses. You can deploy them independently or in combination.

1. Allow-listing: deny everything you didn’t explicitly permit

The strongest control is an allow-list. Instead of trying to enumerate every dangerous tool, you enumerate the tools this agent legitimately needs — and deny everything else. This is the zero-trust baseline. A policy with default_verdict: deny and explicit allow rules for each approved tool achieves this. Example: an agent that should only read from a CRM:

[
  {
    "priority": 10,
    "label": "allow crm reads",
    "tool_name_glob": "crm.get*",
    "verdict": "allow"
  },
  {
    "priority": 20,
    "label": "allow crm search",
    "tool_name_glob": "crm.search",
    "verdict": "allow"
  },
  {
    "priority": 9999,
    "label": "deny everything else",
    "tool_name_glob": "*",
    "verdict": "deny"
  }
]

Any call to shell.exec, db.delete, payment.transfer — whether issued intentionally or triggered by an injected instruction — hits the * catch-all and returns an HTTP 400 firewall_blocked error. The agent sees a structured tool error and cannot retry (the block is marked skip-retry), so it cannot loop around the denial.

Set your policy’s default_verdict to deny for full allow-list enforcement. With the default audit verdict, unmatched calls are allowed and logged but not blocked — useful during rollout but not a security control by itself.

Glob patterns let you allow entire tool families with one rule. The common patterns:

Pattern	What it covers
`crm.*`	All tools in the `crm` namespace
`*.read`	Any read-verb tool across all servers
`db.query`	Exactly this one tool
`*`	Everything (use for your catch-all deny)

Tool matching is first-match-wins in ascending priority order. Put your specific allow rules at low priority numbers and the catch-all deny at a high one.

2. Argument validation: allow the tool, block the dangerous invocation

An allow-list on tool names is coarse — it blocks shell.exec entirely. Sometimes you want to permit a tool but constrain how it can be called. Argument clauses let you match on specific fields inside the tool-call arguments, using JSONPath and a set of operators. Example: allow shell.exec but block rm -rf

{
  "priority": 10,
  "label": "block destructive rm",
  "tool_name_glob": "shell.exec",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.command\",\"op\":\"regex\",\"value\":\"rm\\\\s+-[^\\\\s]*r[^\\\\s]*f|mkfs|dd\\\\s+if=|:\\\\(\\\\)\\\\{.*\\\\}\"}]}",
  "verdict": "deny"
}

This rule fires only when shell.exec is called and the $.command argument matches the destructive-command regex. A normal shell.exec call with a safe command falls through to the next rule (or the default verdict). Put this rule at a lower priority number than any general allow shell.exec rule so it fires first. The full set of argument operators:

Operator	Use it when
`eq`	Exact match on a scalar value (string or number)
`contains`	Substring match — e.g. `$.query` `contains` `DROP TABLE`
`regex`	RE2 pattern match — safe on the hot path, no backtracking
`in`	Value must be in a given array — e.g. allow only specific environments
`cidr_match`	IP address in a CIDR block — useful for egress destination checks
`gt` / `lt`	Numeric comparison — e.g. `$.amount` `gt` `10000` for payment caps

The args_match_json field is a JSON-encoded string — its value is the clause block serialized to JSON, not a nested object. All clauses inside it are AND-ed. If a path doesn’t exist in the call’s arguments, the clause evaluates false and the rule does not fire — the call falls through to the next rule or the default. Payment guard example — deny any payment tool call with an amount exceeding a threshold:

{
  "priority": 5,
  "label": "cap payment amount",
  "tool_name_glob": "payment.*",
  "args_match_json": "{\"clauses\":[{\"path\":\"$.amount_cents\",\"op\":\"gt\",\"value\":100000}]}",
  "verdict": "deny"
}

3. Human-in-the-loop: hold high-stakes calls for approval

For tools that are genuinely necessary but high-stakes — triggering a deployment, approving a refund, sending a bulk email — you can require a human sign-off before the call proceeds. The pending_approval verdict holds the call and returns a firewall_approval_pending response to the agent:

{
  "priority": 20,
  "label": "hold deployment calls for review",
  "tool_name_glob": "deploy.*",
  "verdict": "pending_approval"
}

The agent (or your framework) polls the approval id. A reviewer approves or rejects from the console or via a webhook callback. If approved, the agent re-submits the original call with the single-use approval token and the gateway lets it through once. pending_approval is compatible with argument clauses — you can hold only the invocations that match a threshold, and allow routine ones through:

[
  {
    "priority": 10,
    "label": "hold large deploys",
    "tool_name_glob": "deploy.release",
    "args_match_json": "{\"clauses\":[{\"path\":\"$.environment\",\"op\":\"eq\",\"value\":\"production\"}]}",
    "verdict": "pending_approval"
  },
  {
    "priority": 20,
    "label": "allow staging deploys",
    "tool_name_glob": "deploy.*",
    "verdict": "allow"
  }
]

4. What a blocked call looks like

A denied call on the inbound surface (tool advertised in the request) returns HTTP 400 with error code firewall_blocked. The response includes structured metadata — the matched rule label, reason code, and risk factors — and is marked skip-retry so a loop can’t hammer the same denied call. A call blocked on the response surface (the model already emitted tool_calls) returns a tool error visible to the model, giving it a chance to react — pick a different tool, ask the user, or stop — instead of crashing.

5. First-match-wins ordering

Priority ordering matters. The engine walks rules in ascending priority order and stops at the first match. A common pattern:

Priority	Rule	Verdict
5	`shell.exec` + destructive regex	`deny`
10	`shell.*` (general)	`allow`
20	`crm.*`	`allow`
9999	`*` (catch-all)	`deny`

Priority 5 fires before priority 10 — so shell.exec with a destructive command is denied even though there is a general allow for shell.*. Without the low-priority deny, the allow shell.* rule would win first.

6. Rolling out safely with shadow mode

Before switching a new policy to enforcing, turn on shadow mode. The policy evaluates every tool call and logs the verdict exactly as it would in production, but every enforcing verdict (deny, pending_approval, sanitize) is downgraded to audit — nothing is blocked. The reason in the event log is prefixed [shadow] would deny so you can measure impact in the Events and Runs views. Once you’ve confirmed the policy fires on what you expect — and nothing you didn’t intend — turn shadow mode off. The next call is enforced.

The tight autonomy level applies the block_destructive_shell preset automatically. If you need a quick posture without writing rules, apply tight from the console and it ships a deny policy for destructive shell calls in one step. You can then layer your own allow-list rules on top. See Autonomy levels.

Agent tool abuse rarely arrives in isolation. An unauthorized tool call is often the consequence of another attack vector:

Prompt injection — an attacker embeds instructions in retrieved content that steer the agent toward tools it wasn’t meant to call.
Excessive agency — the agent was granted more tool access than its task requires, making any injection or misconfiguration immediately dangerous.
The threat model — how tool abuse fits in the full attack surface for agentic systems.

The Agent Firewall is the enforcement layer; the principle of least privilege (narrow tool allow-lists, scoped keys) is the design posture that makes it effective.

Firewall rules reference

The complete matching language — tool globs, argument clauses, all operators, verdicts, and the API.

Firewall overview

Policies, surfaces, autonomy levels, HITL approval, and observability.

​1. Allow-listing: deny everything you didn’t explicitly permit

​2. Argument validation: allow the tool, block the dangerous invocation

​3. Human-in-the-loop: hold high-stakes calls for approval

​4. What a blocked call looks like

​5. First-match-wins ordering

​6. Rolling out safely with shadow mode

​7. Related threats

Firewall rules reference

Firewall overview

1. Allow-listing: deny everything you didn’t explicitly permit

2. Argument validation: allow the tool, block the dangerous invocation

3. Human-in-the-loop: hold high-stakes calls for approval

4. What a blocked call looks like

5. First-match-wins ordering

6. Rolling out safely with shadow mode

7. Related threats