Skip to main content
A reasoning agent that gets stuck in a retry loop, fans out a thousand sub-tasks, or simply runs away mid-plan can spend real money before anyone notices. The cap_cost firewall verdict is the circuit breaker for that: you author a per-run cents ceiling once, and the gateway denies the next tool call the moment a run’s accumulated spend crosses it — before that call reaches the model or the tool. This is AI agent cost control enforced at the gateway, not bolted onto your agent loop. Like every firewall verdict, a cap_cost rule lives in a workspace policy, attaches to a key, and takes effect on the next call with no redeploy.

1. The per-run spend circuit breaker

cap_cost is a rule verdict you author with one extra field — cap_cost_cents, the run’s spend ceiling in USD cents. When the rule matches a tool call, the engine compares the agent run’s accumulated spend against that cap:
  • Under the cap → the call is allowed; evaluation continues.
  • Over the cap → the call is denied, with a reason naming the run’s total versus the cap. That is the terminal, circuit-breaker outcome — the run can’t issue another governed call until a fresh run.
The cap is keyed to the agent run, not a single request. A long run that has already burned most of its budget is denied on its next call even when that one call is cheap — the breaker trips on the running total, not the marginal cost.
Run scope, with a per-request fallback. When the request carries an agent-run id, the ceiling applies to the run’s accumulated spend. A call with no run association (e.g. a bare MCP dispatch with no forwarded session) falls back to a per-request comparison instead. Either way, the breaker trips before the over-budget call is dispatched.

2. One concrete example

Cap any run on a key at $5.00 of accumulated spend. A single wildcard rule does it — cap_cost_cents is 500 (cents):
{
  "label": "run cost ceiling $5",
  "tool_name_glob": "*",
  "verdict": "cap_cost",
  "cap_cost_cents": 500
}
Author it in the console rule editor on a policy you’ve created (see Create & attach a policy). Writing a rule is a Developer+ action. Attach the policy to a key via firewall_policy_id, or make it the workspace default, and every run that key drives is now bounded. You can scope the cap tighter than “every tool”. Narrow the tool-name glob so only an expensive family of calls counts toward the breaker — e.g. cap_cost on *.search to bound web-search fan-out while leaving cheap local tools unmetered.
Stack a cheaper warning tier with priorities. A lower-cap audit rule at a higher priority (lower number) lets you watch a run approach its budget in the events feed before the enforcing cap_cost rule trips. First match wins, so order the watcher first.

3. Where it fires — and where it can’t

cap_cost only makes sense before a call is dispatched — that’s the one point where stopping the call still prevents the spend. So it is live on the two pre-dispatch surfaces and rejected on the post-dispatch ones:
Surfacecap_cost?
inbound (advertised tools)Enforced.
mcp (tools/call dispatch)Enforced.
response (model-emitted calls)Rejected on save — nothing left to stop.
egress (outbound destination)Rejected on save — nothing left to stop.
A cap_cost rule pinned to response or egress is refused at save-time, so a rule can never display as live yet be unable to ever deny. Leave the stage empty to cover both pre-dispatch surfaces, or pin it to inbound / mcp.
cap_cost_cents is required and non-negative for a cap_cost rule. The console and API reject a cap_cost rule with no cap, so a misconfigured ceiling can’t silently pass every call through.

4. What the breaker looks like when it trips

An over-budget call is a normal firewall deny:
  • On inbound, the relay returns HTTP 400 with error code firewall_blocked. The block fires before the upstream model call, so it costs no model tokens, and it’s marked skip-retry — re-running the same call would just trip the breaker again.
  • On mcp, it comes back as a tool error so the model sees the rejection and can stop or ask the user, rather than crashing.
The deny reason names the figures, e.g. cap_cost: estimated run cost $5.40 exceeds cap $5.00, so an operator reading the events feed sees exactly why the breaker tripped.
Events never carry a literal cap_cost. You author the verdict as cap_cost, but the engine resolves it to a concrete allow or deny before the event is recorded. The feed shows the allow/deny the agent actually saw — the run-cost ceiling is the reason, not the verdict label. This mirrors how verdicts resolve.

5. Roll it out safely

Because a tripped breaker hard-stops a run, validate it before you enforce. Turn on shadow mode on the policy: the cap_cost rule still evaluates and a would-be deny is downgraded to audit, prefixed [shadow] would …. Watch the events feed to confirm the cap trips where you expect — and only where you expect — then flip shadow mode off to start enforcing. You can also dry-run a policy against a sample call in the Test tab (a Developer+ sandbox) to see the resolved verdict and the matched rule before anything depends on it.

6. How it fits the rest of the firewall

cap_cost is one verdict among six. It pairs naturally with the other controls on the same policy:

Verdicts

The full set — allow, audit, deny, sanitize, pending_approval, and how cap_cost resolves.

Block dangerous tools

Deny destructive shell, deletes, and other high-risk calls outright.

Rule reference

The complete matching language — globs, argument clauses, sequences.

Anomaly detection

Cost spikes flagged against a learned hour-of-week baseline.
A run-cost ceiling is a static, deterministic backstop; the firewall also learns each workspace’s normal cost shape and flags spikes against a 14-day hour-of-week baseline on a Member-readable anomaly feed. Use cap_cost for the hard stop, anomalies for the early signal.
Quota limits on the key itself (credit_limit_usd) bound total spend across all runs; cap_cost bounds a single run. They compose — a runaway loop trips the per-run breaker long before it can exhaust the key’s lifetime credit. See scope: keys, policies, workspaces.

Where to go next

Create & attach a policy

Make a policy, pick a default verdict, bind it to a key.

Shadow mode

Measure a cap before it changes traffic.
For the runaway-agent threats a spend ceiling backstops, see excessive agency and dangerous tool calls.