https://api.orcarouter.ai/v1/... exactly as before.
New here? Apply the
balanced baseline
first and watch what your agent does for
a day. This page is the next step: turning observation into enforcement
for an agent you can’t babysit.1. The secure autonomous agent recipe
A secure autonomous agent needs four things a chatbot doesn’t:A hard cost ceiling
A
cap_cost rule denies the run once its accumulated spend crosses
your cap — the circuit-breaker for a loop that won’t stop.Spike detection
Anomaly detection learns the agent’s normal hour-of-week shape and
flags rate and cost spikes that slip past static rules.
Approval on the dangerous calls
A
pending_approval verdict holds destructive or irreversible tool
calls for a human, instead of trusting the agent to be careful.A key that expires
Scope the agent’s key to an expiry and a credit ceiling so a
forgotten experiment can’t run — or spend — forever.
2. Cap the cost of every run
The first thing a runaway loop blows is your budget. Acap_cost rule is
a strict pre-check cost ceiling: when it matches, the gateway estimates
the request’s cost and denies before dispatch once the run’s
accumulated spend would exceed the cap — so an over-budget call never
reaches the provider.
The cap is run-scoped. The gateway sums the prior spend across the
whole agent run, so a long run that has already burned most of its budget
is denied even when the next individual call is cheap. That’s what makes
it a circuit-breaker rather than a per-request limit.
Add one wildcard rule to your firewall policy:
cap_cost_cents is in USD cents). The
verdict resolves to allow while under budget and deny once the
estimate would cross it. Most built-in firewall templates (Coding,
Support, RAG, Data, DevOps, Browser) ship a per-run cost cap exactly like
this — apply one and edit the cap.
3. Detect spikes against a learned baseline
A cap stops the catastrophe; anomaly detection catches the weird before it becomes one. The Firewall learns each workspace’s normal tool-use shape — a 14-day rolling average bucketed by hour-of-week, so Tuesday-14:00 traffic is compared against Tuesday-14:00 history, not a flat daily mean — and surfaces deviations on a viewer-readable feed:rate_spike — a tool firing far above its norm
rate_spike — a tool firing far above its norm
Per-tool call volume scored against the learned baseline. “143
db.query calls in an hour against a baseline of 8” surfaces even
when each individual call is allowed.burn_spike — cost climbing past the learned spend
burn_spike — cost climbing past the learned spend
The same baseline, applied to spend instead of count — a run that’s
suddenly burning far more than this hour usually does.
retry_loop — an agent hammering a failing tool
retry_loop — an agent hammering a failing tool
The signature of an autonomous agent stuck retrying the same broken
call. See excessive-agency.
novel_path — a tool transition never seen before
novel_path — a tool transition never seen before
A tool-to-tool hop this workspace has never made — the shape of an
agent going somewhere new.
cap_cost rule so a spike that’s also over-budget is stopped,
not just noticed.
4. Hold the dangerous calls for a human
You can’t review every call an autonomous agent makes — but you can make it stop and ask before the handful that matter. Apending_approval
verdict holds a tool call out-of-band:
- The agent issues, say, a
payments.transfercall. The rule matches and the engine returns HTTP 400firewall_approval_pendingwith an approval id — the call never reaches the tool. - A reviewer resolves it from the console (Developer+), or your own
system resolves it via an HMAC-signed webhook callback to
POST /api/v1/firewall/approvals/:id/callback. - The agent polls
GET /api/v1/firewall/approvals/:id; once approved it re-submits the original call with a single-useX-OrcaRouter-Firewall-Approvalheader, and the gateway lets it through that one time.
5. Give the agent a key that expires
The control that outlives every policy is the key itself. An autonomous agent should get a scoped key, not your default one. Set these fields when you mint it (console → keys, or the token API):| Field | Set it to | Why |
|---|---|---|
expired_time | a Unix timestamp | The experiment ends; the key dies with it. -1 means never — don’t use that here. |
credit_limit_usd | a dollar ceiling | A spend cap on the key independent of the run cap. 0 means unlimited. |
firewall_policy_id | your policy above | Binds the cap_cost + approval rules to this key. |
allow_ips | the agent’s egress IPs | A leaked key is useless from anywhere else. |
environment tag too, so the key — and everything it does in
Events and Matches — is attributable to this agent. An expiring,
credit-capped, IP-pinned key is the last line: even if every policy were
somehow bypassed, the blast radius is bounded by time and dollars.
Key configuration is a console / token-API action and is role-gated.
Reading a firewall-gateway key’s plaintext requires Admin+.
6. Put it together
A hardened autonomous agent ends up with one firewall policy and one scoped key:| Layer | Control | Catches |
|---|---|---|
| Budget | cap_cost rule, run-scoped | Runaway loops, denial-of-wallet |
| Behavior | Anomaly feed (rate / burn / retry / novel) | The weird-but-allowed |
| Trust | pending_approval on destructive tools | Irreversible actions |
| Scope | Expiring, credit-capped, IP-pinned key | Forgotten or leaked keys |
7. Next steps
Harden an MCP agent
Govern an agent that reaches tools through MCP servers.
Stop exfiltration
Egress rules for an agent that fetches its own URLs.
Enforcement modes
Observe → shadow → enforce, the safe rollout.
Firewall rules
The matching language behind every rule above.
