Excessive agency & confused deputy

An agent that has more capability than its task needs is a liability waiting to be exploited. Steal its key, trick it with an injected instruction, or compromise one dependency — and everything that key can do is now in the attacker’s hands. This is the excessive agency problem, and it compounds with a closely related pattern called the confused deputy: the agent is not compromised directly, it is tricked into using its legitimate authority on an attacker’s behalf. Both problems share one root cause: the key a compromised agent holds is too powerful for the task it performs. The defense is least agency — give each agent exactly the capability its task requires, and nothing more.

This page is about the gateway controls that limit blast radius. The upstream threat-model context — why agents are high-value targets and how injection works — is in Threat model. For the matching control that governs dangerous individual tool calls, see Dangerous tool calls.

1. What makes an agent excessively capable

When every agent in a workspace shares one key, or when a key is issued once and never revisited, capability drifts upward:

Unrestricted models — the agent can call any model in the workspace, including expensive or highly capable ones it never needs.
No spend ceiling — a runaway loop, a triggered injection, or a billing attack can exhaust the workspace balance before you notice.
No expiry — a key issued during a sprint is still valid a year later, long after the agent it was minted for is retired.
No IP constraint — the credential works from anywhere, so a leaked key has no geographic limit.
No tool allow-list — the agent can invoke any tool, even ones unrelated to its function.

Any one of these alone is a widened blast radius. Combined, a single compromised agent can do everything a workspace admin can do — call the most powerful model, spend the full balance, reach every tool.

2. The confused deputy pattern

The confused deputy is a specialisation of excessive agency. The agent is not hijacked; it is convinced. A prompt-injection payload in a retrieved web page, document, or tool result tells the agent to take an action it is legitimately authorized to perform — move money, delete a record, send a message — on the attacker’s behalf. The agent acts. It was authorized to do exactly that. The authorization check passes. The damage is done. Defense requires two things working together:

Narrow scope — the agent cannot be tricked into doing something its task never intended, because it is not authorized to do it at all.
Human approval for irreversible actions — even within authorized scope, a high-stakes call requires a human to confirm before it executes.

3. Defense in depth: the four layers

OrcaRouter enforces least agency across four independent controls that compose on a single API key. None requires a code change to your agent.

Layer 1 — Scoped key (identity + hard limits)

Every agent should have its own API key. The key carries hard limits that the gateway enforces regardless of what the agent requests:

Field	What it restricts
`model_limits`	The exact set of models this key may call. A request for any other model is rejected before it leaves the gateway.
`allow_ips`	Requests from any address not on this list are rejected at the auth layer. Empty means no IP restriction.
`credit_limit_usd`	Lifetime spend cap in USD. `0` means unlimited. The gateway enforces this against cumulative spend on the key.
`expired_time`	Absolute expiry timestamp. `-1` means the key never expires. Set this to the agent’s deployment lifecycle.
`environment`	A label (`prod`, `staging`, `dev`) for organizing keys and filtering audit logs.

These limits are enforced at the key level — before any policy, before any model call. They are the outermost blast-radius boundary.

Layer 2 — Firewall policy (tool allow-list)

Attach a firewall policy to the key via firewall_policy_id. The policy governs every tool call that key issues:

Write rules that allow the tool names the agent legitimately uses (tool-name globs are supported — e.g. db.query*).
Set the policy’s default_verdict to deny so anything not explicitly listed is blocked.
Add argument predicates to restrict even the allowed tools — e.g. allow db.query only when the database argument matches a specific schema.

A key with no firewall attachment falls back to the workspace default policy. For agents with narrow tool needs, an explicit attachment with a tight policy is always preferable to relying on the workspace default. See Firewall rules for the full matching language.

Layer 3 — Human approval for high-stakes actions (`pending_approval`)

For irreversible or high-value tool calls — a payment dispatch, a record delete, an email send — add a pending_approval rule. The flow:

The agent issues the tool call. The firewall holds it and returns a “held” response carrying an approval id. The call does not reach the tool.
A reviewer approves or rejects out-of-band — from the console (Developer+) or via an HMAC-signed webhook to your own approval system.
Your agent polls the approval id. Once approved, it re-submits the original call with a single-use X-OrcaRouter-Firewall-Approval header. The gateway passes it through exactly once.

The confused deputy is stopped here even when scope is valid: a human confirms the action is intentional before it executes.

Layer 4 — Per-run cost cap (`cap_cost`)

A cap_cost rule denies any tool call once the agent run’s accumulated spend exceeds a per-rule ceiling (in cents). This is the circuit-breaker for:

Runaway loops triggered by injection.
Billing attacks that drive spend before any human notices.
Accidental recursion in multi-step plans.

cap_cost operates at the run level, not the key lifetime level — so it resets per agent invocation, and a single misbehaving run cannot exhaust the key’s credit_limit_usd ceiling.

4. A well-scoped agent key — example

An agent that summarizes customer tickets using gpt-4o-mini and queries a read-only replica should look like this:

model_limits: ["openai/gpt-4o-mini"] — cannot escalate to a more capable or expensive model.
allow_ips: the worker pool’s egress CIDR — the key is inert everywhere else.
credit_limit_usd: a weekly ceiling matching the task’s expected cost, with some headroom — e.g. 5.00.
expired_time: the end of the sprint or deployment period — the key self-expires without manual cleanup.
environment: "prod" — appears in log filters and anomaly views.
guardrail_id: a guardrail scoped to this agent’s data sensitivity (PII masking, no secrets in output).
firewall_policy_id: a policy that allow-lists db.query* and ticket.read* only, default verdict deny.

When this agent is tricked into exfiltrating data via an injected instruction, the blast radius is: one model, one IP range, one tool namespace, one cost ceiling. The rest of the workspace is unaffected.

is_firewall_gateway marks a key as a gateway-scoped token for the MCP-dispatch and evaluate-hook routes. Only create these for agents that drive the firewall programmatically — never for general inference traffic. A gateway key on the inference path exposes routes that broad-purpose keys should never reach. Enabling is_firewall_gateway requires Admin+.

5. Roles required

Action	Minimum role
Read a key, firewall policy, or guardrail match	Member
Read firewall events, runs, or traces	Developer
Create or edit keys, firewall policies, rules	Developer
Approve a held tool call from the console	Developer
Enable `is_firewall_gateway` on a key	Admin

6. Relationship to other threats

Excessive agency is the enabler for almost every other agent threat:

Dangerous tool calls — a key with a tight tool allow-list cannot be forced to call a tool it does not list, even if injection succeeds.
Prompt injection — scope limits the damage injection can do; approval gates block the irreversible actions injection tries to trigger.
Threat model — the full attack surface map, showing where excessive agency sits relative to other vectors.

Least agency does not prevent injection. It shrinks what injection can achieve.

Scoped keys & policies

The full key-fields reference, resolution order, and the workspace boundary model.

Firewall

Policy authoring, verdicts, HITL approval flow, and the full API reference.

Least agency — one narrow key per agent, a tight tool allow-list, a spend cap, and human approval for irreversible actions — is the primary defense against excessive agency llm attacks and the confused deputy pattern.

​1. What makes an agent excessively capable

​2. The confused deputy pattern

​3. Defense in depth: the four layers

​Layer 1 — Scoped key (identity + hard limits)

​Layer 2 — Firewall policy (tool allow-list)

​Layer 3 — Human approval for high-stakes actions (pending_approval)

​Layer 4 — Per-run cost cap (cap_cost)

​4. A well-scoped agent key — example

​5. Roles required

​6. Relationship to other threats

Scoped keys & policies

Firewall

1. What makes an agent excessively capable

2. The confused deputy pattern

3. Defense in depth: the four layers

Layer 1 — Scoped key (identity + hard limits)

Layer 2 — Firewall policy (tool allow-list)

Layer 3 — Human approval for high-stakes actions (`pending_approval`)

Layer 4 — Per-run cost cap (`cap_cost`)

4. A well-scoped agent key — example

5. Roles required

6. Relationship to other threats