gpt-4o-mini to the most expensive model you have access to,
or to one whose data-handling you never approved.
The fix is a per-key model allow-list. Each key carries a
model_limits field (gated by model_limits_enabled). When it’s on, a
request for any model not on the list is rejected at the gateway —
before a channel is selected and before anything leaves for a provider.
This is one constraint on the key object.
It composes with the key’s IP allow-list, spend cap, expiry, and attached
guardrail / firewall policy — each narrows the key independently.
1. Why restrict model access per API key
Model choice is an agency lever. A key that can call any model can be steered into:- Cost blow-ups — switching to a premium model multiplies the bill per token.
- Capability creep — a task scoped for a small model gets routed to a frontier model that can do far more than you intended.
- Compliance drift — sending traffic to a model family you haven’t cleared for a given data class.
2. The two fields
Model limits live on the key as a pair:| Field | Type | Meaning |
|---|---|---|
model_limits_enabled | bool | Master switch. When false, the key reaches every model the workspace allows. |
model_limits | list | The allow-list of model names. Only meaningful when model_limits_enabled is true. |
3. Set it on a key
Configure model limits in the console key editor (/console/token),
the same place you set the key’s other constraints. Creating or editing a
key requires the Developer role or above.
- Open the key (or Create key).
- Enable Model limits.
- Pick the models this key may call — type to filter the workspace’s available models.
- Save. The change takes effect on the key’s next request — no redeploy, no key rotation.
gpt-4o-mini. Any other model name
on a request from this key is rejected — there is no fallback to a default
model and no silent downgrade.
4. What a rejected request looks like
Whenmodel_limits_enabled is on and a request names a model outside the
list, the gateway aborts the request with HTTP 403 and an
OpenAI-shaped error body:
It happens before provider selection
It happens before provider selection
The check runs while the gateway is still choosing a channel — the
request never reaches an upstream provider, so a forbidden model costs
no model tokens.
Empty list = no models
Empty list = no models
With the switch on and an empty allow-list, the message is “This
token has no access to any models” and every request is rejected.
This is the difference between “restrict to a list” and “lock the key
out of inference entirely.”
Matching is on the canonical model name
Matching is on the canonical model name
The request’s model name is normalized before the list is checked, so
related variants (e.g. thinking variants) resolve to the same
canonical name you allow-listed. List the base model name the console
shows you.
5. Model limits vs. group entitlements
Two different things decide whether a key can call a model. Don’t confuse them:| Layer | Scope | Question it answers |
|---|---|---|
| Workspace entitlement | Workspace | Is this model available to the workspace at all? |
model_limits | Single key | Of the available models, which may THIS key use? |
model_limits only ever narrows. A key cannot use model limits to
reach a model the workspace itself isn’t entitled to — it can only carve a
smaller allow-list out of what’s already permitted. To grant a key
nothing extra but strictly less, that’s exactly what this field is
for.
6. Where this fits the least-agency posture
Model limits are one line of the per-agent key recipe. The narrowest useful key for an autonomous agent pins all of its axes at once:model_limits— the one or two models the agent needs (this page).allow_ips— the agent’s egress range, see IP allow-list.credit_limit_usd— a spend ceiling, see Quota cap & expiry.expired_time— an automatic expiry, see Expiring keys.guardrail_id/firewall_policy_id— content and tool-call policy, see Bind policies to a key.
Model limits are an identity constraint on the key, not a content or
action policy. They don’t inspect prompts (that’s
Guardrails) or tool calls (that’s the
Firewall) — they decide, up front, which model the
key is even allowed to address.
7. Next steps
The key object
Every field a key carries — model limits, IP list, caps, expiry, and
policy attachments — in one reference.
Least-agency checklist
The full per-agent key recipe: scope every axis to the minimum the
agent needs.
Scope, keys & policies
How keys, guardrails, and firewall policies bind together into one
agent identity.
Bind policies to a key
Attach a guardrail and a firewall policy to the same key.
