Skip to main content
A long-running autonomous agent is the hardest thing to secure. It loops on its own for hours, picks its own tools, fetches its own URLs, and spends your money the whole time. The failure modes aren’t a single bad prompt — they’re a retry loop that burns $400 overnight, a tool call you never reviewed, a key you minted for a one-week experiment that still works six months later. This recipe wires four controls around exactly that shape of agent. You configure all of them in the console (or the REST API) — the agent keeps calling https://api.orcarouter.ai/v1/... exactly as before.
New here? Apply the balanced baseline first and watch what your agent does for a day. This page is the next step: turning observation into enforcement for an agent you can’t babysit.

1. The secure autonomous agent recipe

A secure autonomous agent needs four things a chatbot doesn’t:

A hard cost ceiling

A cap_cost rule denies the run once its accumulated spend crosses your cap — the circuit-breaker for a loop that won’t stop.

Spike detection

Anomaly detection learns the agent’s normal hour-of-week shape and flags rate and cost spikes that slip past static rules.

Approval on the dangerous calls

A pending_approval verdict holds destructive or irreversible tool calls for a human, instead of trusting the agent to be careful.

A key that expires

Scope the agent’s key to an expiry and a credit ceiling so a forgotten experiment can’t run — or spend — forever.
Each maps to one Firewall policy or key field. None touches your agent code.

2. Cap the cost of every run

The first thing a runaway loop blows is your budget. A cap_cost rule is a strict pre-check cost ceiling: when it matches, the gateway estimates the request’s cost and denies before dispatch once the run’s accumulated spend would exceed the cap — so an over-budget call never reaches the provider. The cap is run-scoped. The gateway sums the prior spend across the whole agent run, so a long run that has already burned most of its budget is denied even when the next individual call is cheap. That’s what makes it a circuit-breaker rather than a per-request limit. Add one wildcard rule to your firewall policy:
{
  "priority": 50,
  "tool_name_glob": "*",
  "verdict": "cap_cost",
  "cap_cost_cents": 1000
}
This caps the run at $10 (cap_cost_cents is in USD cents). The verdict resolves to allow while under budget and deny once the estimate would cross it. Most built-in firewall templates (Coding, Support, RAG, Data, DevOps, Browser) ship a per-run cost cap exactly like this — apply one and edit the cap.
Run-scoped accumulation needs request-log capture enabled for the workspace. With it off, the prior-spend rollup reads zero and the cap degrades to per-request only — still safe, but it won’t catch a slow 500-call drip. See denial-of-wallet.

3. Detect spikes against a learned baseline

A cap stops the catastrophe; anomaly detection catches the weird before it becomes one. The Firewall learns each workspace’s normal tool-use shape — a 14-day rolling average bucketed by hour-of-week, so Tuesday-14:00 traffic is compared against Tuesday-14:00 history, not a flat daily mean — and surfaces deviations on a viewer-readable feed:
Per-tool call volume scored against the learned baseline. “143 db.query calls in an hour against a baseline of 8” surfaces even when each individual call is allowed.
The same baseline, applied to spend instead of count — a run that’s suddenly burning far more than this hour usually does.
The signature of an autonomous agent stuck retrying the same broken call. See excessive-agency.
A tool-to-tool hop this workspace has never made — the shape of an agent going somewhere new.
The feed reports tool names, redacted token ids, and counts — never raw arguments. Reading it is open to any Member; a Developer+ can snooze the feed for up to 7 days while investigating. Pair the feed with a cap_cost rule so a spike that’s also over-budget is stopped, not just noticed.

4. Hold the dangerous calls for a human

You can’t review every call an autonomous agent makes — but you can make it stop and ask before the handful that matter. A pending_approval verdict holds a tool call out-of-band:
  1. The agent issues, say, a payments.transfer call. The rule matches and the engine returns HTTP 400 firewall_approval_pending with an approval id — the call never reaches the tool.
  2. A reviewer resolves it from the console (Developer+), or your own system resolves it via an HMAC-signed webhook callback to POST /api/v1/firewall/approvals/:id/callback.
  3. The agent polls GET /api/v1/firewall/approvals/:id; once approved it re-submits the original call with a single-use X-OrcaRouter-Firewall-Approval header, and the gateway lets it through that one time.
A rule that holds writes to a destructive surface:
{
  "priority": 20,
  "tool_name_glob": "payments.*",
  "verdict": "pending_approval"
}
Roll this out in shadow mode first — pending_approval downgrades to audit, so you see which calls would hold without actually blocking your agent. Flip shadow off when the feed looks right.

5. Give the agent a key that expires

The control that outlives every policy is the key itself. An autonomous agent should get a scoped key, not your default one. Set these fields when you mint it (console → keys, or the token API):
FieldSet it toWhy
expired_timea Unix timestampThe experiment ends; the key dies with it. -1 means never — don’t use that here.
credit_limit_usda dollar ceilingA spend cap on the key independent of the run cap. 0 means unlimited.
firewall_policy_idyour policy aboveBinds the cap_cost + approval rules to this key.
allow_ipsthe agent’s egress IPsA leaked key is useless from anywhere else.
Set an environment tag too, so the key — and everything it does in Events and Matches — is attributable to this agent. An expiring, credit-capped, IP-pinned key is the last line: even if every policy were somehow bypassed, the blast radius is bounded by time and dollars.
Key configuration is a console / token-API action and is role-gated. Reading a firewall-gateway key’s plaintext requires Admin+.

6. Put it together

A hardened autonomous agent ends up with one firewall policy and one scoped key:
LayerControlCatches
Budgetcap_cost rule, run-scopedRunaway loops, denial-of-wallet
BehaviorAnomaly feed (rate / burn / retry / novel)The weird-but-allowed
Trustpending_approval on destructive toolsIrreversible actions
ScopeExpiring, credit-capped, IP-pinned keyForgotten or leaked keys
Author the budget and approval rules together, set a per-run cap with firewall rules, and read the rest of the Firewall reference for surfaces, verdicts, and observability. For the related threats this recipe defends against, see excessive-agency, dangerous-tool-calls, and denial-of-wallet.

7. Next steps

Harden an MCP agent

Govern an agent that reaches tools through MCP servers.

Stop exfiltration

Egress rules for an agent that fetches its own URLs.

Enforcement modes

Observe → shadow → enforce, the safe rollout.

Firewall rules

The matching language behind every rule above.