This page is about the gateway controls that limit blast radius. The upstream
threat-model context — why agents are high-value targets and how injection
works — is in Threat model. For the
matching control that governs dangerous individual tool calls, see
Dangerous tool calls.
1. What makes an agent excessively capable
When every agent in a workspace shares one key, or when a key is issued once and never revisited, capability drifts upward:- Unrestricted models — the agent can call any model in the workspace, including expensive or highly capable ones it never needs.
- No spend ceiling — a runaway loop, a triggered injection, or a billing attack can exhaust the workspace balance before you notice.
- No expiry — a key issued during a sprint is still valid a year later, long after the agent it was minted for is retired.
- No IP constraint — the credential works from anywhere, so a leaked key has no geographic limit.
- No tool allow-list — the agent can invoke any tool, even ones unrelated to its function.
2. The confused deputy pattern
The confused deputy is a specialisation of excessive agency. The agent is not hijacked; it is convinced. A prompt-injection payload in a retrieved web page, document, or tool result tells the agent to take an action it is legitimately authorized to perform — move money, delete a record, send a message — on the attacker’s behalf. The agent acts. It was authorized to do exactly that. The authorization check passes. The damage is done. Defense requires two things working together:- Narrow scope — the agent cannot be tricked into doing something its task never intended, because it is not authorized to do it at all.
- Human approval for irreversible actions — even within authorized scope, a high-stakes call requires a human to confirm before it executes.
3. Defense in depth: the four layers
OrcaRouter enforces least agency across four independent controls that compose on a single API key. None requires a code change to your agent.Layer 1 — Scoped key (identity + hard limits)
Every agent should have its own API key. The key carries hard limits that the gateway enforces regardless of what the agent requests:| Field | What it restricts |
|---|---|
model_limits | The exact set of models this key may call. A request for any other model is rejected before it leaves the gateway. |
allow_ips | Requests from any address not on this list are rejected at the auth layer. Empty means no IP restriction. |
credit_limit_usd | Lifetime spend cap in USD. 0 means unlimited. The gateway enforces this against cumulative spend on the key. |
expired_time | Absolute expiry timestamp. -1 means the key never expires. Set this to the agent’s deployment lifecycle. |
environment | A label (prod, staging, dev) for organizing keys and filtering audit logs. |
Layer 2 — Firewall policy (tool allow-list)
Attach a firewall policy to the key viafirewall_policy_id. The policy governs every tool call that key issues:
- Write rules that allow the tool names the agent legitimately uses
(tool-name globs are supported — e.g.
db.query*). - Set the policy’s
default_verdicttodenyso anything not explicitly listed is blocked. - Add argument predicates to restrict even the allowed tools — e.g. allow
db.queryonly when thedatabaseargument matches a specific schema.
Layer 3 — Human approval for high-stakes actions (pending_approval)
For irreversible or high-value tool calls — a payment dispatch, a record
delete, an email send — add a pending_approval rule. The flow:
- The agent issues the tool call. The firewall holds it and returns a “held” response carrying an approval id. The call does not reach the tool.
- A reviewer approves or rejects out-of-band — from the console (Developer+) or via an HMAC-signed webhook to your own approval system.
- Your agent polls the approval id. Once approved, it re-submits the
original call with a single-use
X-OrcaRouter-Firewall-Approvalheader. The gateway passes it through exactly once.
Layer 4 — Per-run cost cap (cap_cost)
A cap_cost rule denies any tool call once the agent run’s accumulated spend
exceeds a per-rule ceiling (in cents). This is the circuit-breaker for:
- Runaway loops triggered by injection.
- Billing attacks that drive spend before any human notices.
- Accidental recursion in multi-step plans.
cap_cost operates at the run level, not the key lifetime level — so it
resets per agent invocation, and a single misbehaving run cannot exhaust the
key’s credit_limit_usd ceiling.
4. A well-scoped agent key — example
An agent that summarizes customer tickets usinggpt-4o-mini and queries a
read-only replica should look like this:
model_limits:["openai/gpt-4o-mini"]— cannot escalate to a more capable or expensive model.allow_ips: the worker pool’s egress CIDR — the key is inert everywhere else.credit_limit_usd: a weekly ceiling matching the task’s expected cost, with some headroom — e.g.5.00.expired_time: the end of the sprint or deployment period — the key self-expires without manual cleanup.environment:"prod"— appears in log filters and anomaly views.guardrail_id: a guardrail scoped to this agent’s data sensitivity (PII masking, no secrets in output).firewall_policy_id: a policy that allow-listsdb.query*andticket.read*only, default verdictdeny.
is_firewall_gateway marks a key as a gateway-scoped token for the
MCP-dispatch and evaluate-hook routes. Only create these for agents that
drive the firewall programmatically — never for general inference traffic. A
gateway key on the inference path exposes routes that broad-purpose keys
should never reach. Enabling is_firewall_gateway requires Admin+.5. Roles required
| Action | Minimum role |
|---|---|
| Read a key, firewall policy, or guardrail match | Member |
| Read firewall events, runs, or traces | Developer |
| Create or edit keys, firewall policies, rules | Developer |
| Approve a held tool call from the console | Developer |
Enable is_firewall_gateway on a key | Admin |
6. Relationship to other threats
Excessive agency is the enabler for almost every other agent threat:- Dangerous tool calls — a key with a tight tool allow-list cannot be forced to call a tool it does not list, even if injection succeeds.
- Prompt injection — scope limits the damage injection can do; approval gates block the irreversible actions injection tries to trigger.
- Threat model — the full attack surface map, showing where excessive agency sits relative to other vectors.
Scoped keys & policies
The full key-fields reference, resolution order, and the workspace
boundary model.
Firewall
Policy authoring, verdicts, HITL approval flow, and the full API
reference.
