Skip to main content
A single API key can reach every model your workspace is entitled to. That’s convenient for a console session and dangerous for a long-lived agent: a prompt-injected agent holding an unrestricted key can quietly switch from gpt-4o-mini to the most expensive model you have access to, or to one whose data-handling you never approved. The fix is a per-key model allow-list. Each key carries a model_limits field (gated by model_limits_enabled). When it’s on, a request for any model not on the list is rejected at the gateway — before a channel is selected and before anything leaves for a provider.
This is one constraint on the key object. It composes with the key’s IP allow-list, spend cap, expiry, and attached guardrail / firewall policy — each narrows the key independently.

1. Why restrict model access per API key

Model choice is an agency lever. A key that can call any model can be steered into:
  • Cost blow-ups — switching to a premium model multiplies the bill per token.
  • Capability creep — a task scoped for a small model gets routed to a frontier model that can do far more than you intended.
  • Compliance drift — sending traffic to a model family you haven’t cleared for a given data class.
Restricting a key to the one or two models an agent actually needs closes all three at once. It’s the model-axis equivalent of the firewall allow-listing tools — the agent can only reach what you named, and nothing else.

2. The two fields

Model limits live on the key as a pair:
FieldTypeMeaning
model_limits_enabledboolMaster switch. When false, the key reaches every model the workspace allows.
model_limitslistThe allow-list of model names. Only meaningful when model_limits_enabled is true.
The two fields are independent, and the combination matters: model_limits_enabled = true with an empty list means the key can reach no models — every request is rejected with “This token has no access to any models.” Turn the switch on only once you’ve named at least one model.

3. Set it on a key

Configure model limits in the console key editor (/console/token), the same place you set the key’s other constraints. Creating or editing a key requires the Developer role or above.
  1. Open the key (or Create key).
  2. Enable Model limits.
  3. Pick the models this key may call — type to filter the workspace’s available models.
  4. Save. The change takes effect on the key’s next request — no redeploy, no key rotation.
A scheduled summarizer that should only ever touch one cheap model ends up with an allow-list of exactly one entry:
model_limits_enabled: true
model_limits:         ["openai/gpt-4o-mini"]
From that point the key is pinned to gpt-4o-mini. Any other model name on a request from this key is rejected — there is no fallback to a default model and no silent downgrade.
Pair model limits with a credit_limit_usd cap on the same key. The model list bounds which model a runaway loop can reach; the spend cap bounds how much it can burn before the key stops working. Two independent ceilings, both enforced at the gateway. See Quota cap & expiry.

4. What a rejected request looks like

When model_limits_enabled is on and a request names a model outside the list, the gateway aborts the request with HTTP 403 and an OpenAI-shaped error body:
{
  "error": {
    "message": "This token has no access to model claude-opus-4-8 (request id: 2024...abc)",
    "type": "orcarouter_api_error",
    "code": ""
  }
}
Key properties of the rejection:
The check runs while the gateway is still choosing a channel — the request never reaches an upstream provider, so a forbidden model costs no model tokens.
With the switch on and an empty allow-list, the message is “This token has no access to any models” and every request is rejected. This is the difference between “restrict to a list” and “lock the key out of inference entirely.”
The request’s model name is normalized before the list is checked, so related variants (e.g. thinking variants) resolve to the same canonical name you allow-listed. List the base model name the console shows you.

5. Model limits vs. group entitlements

Two different things decide whether a key can call a model. Don’t confuse them:
LayerScopeQuestion it answers
Workspace entitlementWorkspaceIs this model available to the workspace at all?
model_limitsSingle keyOf the available models, which may THIS key use?
model_limits only ever narrows. A key cannot use model limits to reach a model the workspace itself isn’t entitled to — it can only carve a smaller allow-list out of what’s already permitted. To grant a key nothing extra but strictly less, that’s exactly what this field is for.

6. Where this fits the least-agency posture

Model limits are one line of the per-agent key recipe. The narrowest useful key for an autonomous agent pins all of its axes at once:
  • model_limits — the one or two models the agent needs (this page).
  • allow_ips — the agent’s egress range, see IP allow-list.
  • credit_limit_usd — a spend ceiling, see Quota cap & expiry.
  • expired_time — an automatic expiry, see Expiring keys.
  • guardrail_id / firewall_policy_id — content and tool-call policy, see Bind policies to a key.
When such a key is compromised via prompt injection, the blast radius is bounded on every axis — including which models the attacker can spend your budget on.
Model limits are an identity constraint on the key, not a content or action policy. They don’t inspect prompts (that’s Guardrails) or tool calls (that’s the Firewall) — they decide, up front, which model the key is even allowed to address.

7. Next steps

The key object

Every field a key carries — model limits, IP list, caps, expiry, and policy attachments — in one reference.

Least-agency checklist

The full per-agent key recipe: scope every axis to the minimum the agent needs.

Scope, keys & policies

How keys, guardrails, and firewall policies bind together into one agent identity.

Bind policies to a key

Attach a guardrail and a firewall policy to the same key.
Restricting model access per API key is the cheapest agency control you can apply: one allow-list, enforced at the gateway, that no compromised agent can talk its way around.