1. The two modes
Every key resolves to exactly one of two states:Unlimited
unlimited_quota = true. The key draws on the workspace balance with
no per-key ceiling. No spend check runs at request time — the only
limit is the workspace’s own balance.Bounded
credit_limit_usd > 0. The key carries its own lifetime spend cap in
USD. Once cumulative spend reaches the cap, the key stops working —
the rest of the workspace is untouched./console/token). Creating
or editing a key requires the Developer role or above.
credit_limit_usd = 0 means unlimited — zero is the sentinel for “no
cap”, not “a zero-dollar cap”. To bound a key, give it a positive dollar
amount.2. How an api key quota is enforced
When you setcredit_limit_usd to a positive number, the gateway converts
it into an internal remain_quota balance for that key and flips
unlimited_quota to false. From then on:
remain_quotais the key’s remaining spend headroom, drawn down as the key bills usage.used_quotais the cumulative spend the key has already booked.- On every relay call, the gateway checks the key before it forwards the
request. A bounded key whose
remain_quotahas reached zero is rejected as exhausted — the call never reaches the model.
unlimited_quota = true) skips that balance check
entirely; it is bounded only by the workspace balance and by any other
key-level limits you set (model allow-list, IP allow-list, expiry).
3. One concrete example
Say you’re deploying a scheduled summarization agent and you want to guarantee it can never spend more than $25 no matter what the model does. Set the cap when you create the key:unlimited_quota = false and a
remain_quota worth 25, the
key is exhausted and every further /v1/* call is rejected — without you
watching a dashboard, and without touching the rest of the workspace.
To make the same key unlimited later, edit it and flip the unlimited
toggle — the console sets unlimited_quota = true and credit_limit_usd = 0 together, and the key can draw on the full workspace balance again.
4. Which mode to pick
Agent / automation keys → bounded
Agent / automation keys → bounded
Any key handed to an autonomous agent, a CI job, or a third-party
integration should be bounded. A spend cap is the cheapest guarantee
that a prompt-injection loop or a retry storm can’t run up an unbounded
bill — the cap stops the key before the damage compounds. Pair it with
a tight model limit and an
IP allow-list.
Short-lived / experiment keys → bounded + expiry
Short-lived / experiment keys → bounded + expiry
For a key that exists only for a demo, a load test, or a single
deployment, combine a small
credit_limit_usd with an expired_time.
The key self-retires on whichever limit it hits first. See
Quota cap & expiry and
Expiring keys.Trusted internal / high-volume keys → unlimited
Trusted internal / high-volume keys → unlimited
A key used by a core production service you fully control, where a
per-key cap would just cause spurious outages, can stay unlimited —
the workspace balance is the backstop. Keep these keys few, name them
clearly, and still scope them with model and IP limits.
5. How the cap fields relate
The three fields that govern this are a single switch with a derived balance — you set the dollar cap, the gateway derives the rest:| Field | Meaning |
|---|---|
credit_limit_usd | Your input. > 0 = bounded cap in USD; 0 = unlimited. |
unlimited_quota | true when the key has no cap; set to false automatically when you give a positive credit_limit_usd. |
remain_quota | Derived spend headroom for a bounded key; reaching zero exhausts the key. |
credit_limit_usd (or unlimited_quota) in the editor.
remain_quota and used_quota are maintained by the gateway as the key
bills usage — they’re read-only telemetry, surfaced in the console’s usage
views.
6. Where this sits in the control stack
A spend cap bounds how much a key can do; the rest of the key’s scope bounds what it can do. The two compose:Quota cap & expiry
Combine a dollar cap with an absolute expiry so a key self-retires on
whichever limit it hits first.
The token object
Every field a key carries — model limits, IP allow-list, policy
attachments, environment label — in one reference.
Least-agency checklist
The full recipe for the narrowest possible key, one constraint at a
time.
Scope, keys & policies
How the cap fits the workspace → policy → key hierarchy, and how
bounding a key shrinks the blast radius.
