Skip to main content
A key with no ceiling is a key that can drain your whole workspace balance if an agent loops. The single most effective way to bound the blast radius of a compromised or runaway agent is to give its key a spend cap. On the hosted gateway every key is either unlimited or bounded by an api key quota measured in US dollars — and the choice is one field in the key editor. This page explains the two modes, how the cap is enforced on the relay path, and when to pick which. For the full set of constraints a key carries — model allow-lists, IP allow-lists, policy attachments — see The token object.

1. The two modes

Every key resolves to exactly one of two states:

Unlimited

unlimited_quota = true. The key draws on the workspace balance with no per-key ceiling. No spend check runs at request time — the only limit is the workspace’s own balance.

Bounded

credit_limit_usd > 0. The key carries its own lifetime spend cap in USD. Once cumulative spend reaches the cap, the key stops working — the rest of the workspace is untouched.
You set this in the console Keys screen (/console/token). Creating or editing a key requires the Developer role or above.
credit_limit_usd = 0 means unlimited — zero is the sentinel for “no cap”, not “a zero-dollar cap”. To bound a key, give it a positive dollar amount.

2. How an api key quota is enforced

When you set credit_limit_usd to a positive number, the gateway converts it into an internal remain_quota balance for that key and flips unlimited_quota to false. From then on:
  • remain_quota is the key’s remaining spend headroom, drawn down as the key bills usage.
  • used_quota is the cumulative spend the key has already booked.
  • On every relay call, the gateway checks the key before it forwards the request. A bounded key whose remain_quota has reached zero is rejected as exhausted — the call never reaches the model.
An unlimited key (unlimited_quota = true) skips that balance check entirely; it is bounded only by the workspace balance and by any other key-level limits you set (model allow-list, IP allow-list, expiry).
A bounded key is a lifetime cap, not a rolling monthly budget — the cap counts total spend over the key’s life. For a budget that resets, issue a fresh bounded key on your own cadence (e.g. a new key per sprint) and revoke the old one. See Manage keys.

3. One concrete example

Say you’re deploying a scheduled summarization agent and you want to guarantee it can never spend more than $25 no matter what the model does. Set the cap when you create the key:
// POST to the console Keys screen (Developer+).
// Configure in the console — the relay key (sk-orca-…) is never used to
// administer keys; it is only presented on /v1/* inference calls.
{
  "name": "nightly-summarizer",
  "credit_limit_usd": 25,        // bounded: $25 lifetime cap
  "model_limits_enabled": true,
  "model_limits": ["openai/gpt-4o-mini"],
  "expired_time": -1             // -1 = never expires
}
The gateway stores this as a bounded key: unlimited_quota = false and a remain_quota worth 25.Theagentcallsthemodelwiththeskorcarelaykeyasusual.Themomentcumulativespendhits25. The agent calls the model with the `sk-orca-…` relay key as usual. The moment cumulative spend hits 25, the key is exhausted and every further /v1/* call is rejected — without you watching a dashboard, and without touching the rest of the workspace. To make the same key unlimited later, edit it and flip the unlimited toggle — the console sets unlimited_quota = true and credit_limit_usd = 0 together, and the key can draw on the full workspace balance again.

4. Which mode to pick

Any key handed to an autonomous agent, a CI job, or a third-party integration should be bounded. A spend cap is the cheapest guarantee that a prompt-injection loop or a retry storm can’t run up an unbounded bill — the cap stops the key before the damage compounds. Pair it with a tight model limit and an IP allow-list.
For a key that exists only for a demo, a load test, or a single deployment, combine a small credit_limit_usd with an expired_time. The key self-retires on whichever limit it hits first. See Quota cap & expiry and Expiring keys.
A key used by a core production service you fully control, where a per-key cap would just cause spurious outages, can stay unlimited — the workspace balance is the backstop. Keep these keys few, name them clearly, and still scope them with model and IP limits.
A bounded key that exhausts mid-run starts rejecting calls immediately. That’s the point — but it means an unattended agent can stop partway through a job. Size the cap for the work you expect, and watch spend in the console’s usage views so you can raise the cap before it bites a legitimate run.

5. How the cap fields relate

The three fields that govern this are a single switch with a derived balance — you set the dollar cap, the gateway derives the rest:
FieldMeaning
credit_limit_usdYour input. > 0 = bounded cap in USD; 0 = unlimited.
unlimited_quotatrue when the key has no cap; set to false automatically when you give a positive credit_limit_usd.
remain_quotaDerived spend headroom for a bounded key; reaching zero exhausts the key.
You only ever set credit_limit_usd (or unlimited_quota) in the editor. remain_quota and used_quota are maintained by the gateway as the key bills usage — they’re read-only telemetry, surfaced in the console’s usage views.

6. Where this sits in the control stack

A spend cap bounds how much a key can do; the rest of the key’s scope bounds what it can do. The two compose:

Quota cap & expiry

Combine a dollar cap with an absolute expiry so a key self-retires on whichever limit it hits first.

The token object

Every field a key carries — model limits, IP allow-list, policy attachments, environment label — in one reference.

Least-agency checklist

The full recipe for the narrowest possible key, one constraint at a time.

Scope, keys & policies

How the cap fits the workspace → policy → key hierarchy, and how bounding a key shrinks the blast radius.
The narrower each key’s spend cap, the smaller the bill any one compromised agent can run up — and the clearer your audit trail of what each key was authorized to spend.