429 Too Many Requests response with a
Retry-After header.
Limit layers
Per API key. Default is set by your plan. Protects against runaway keys (e.g. a leaked key attempting a DDoS). Per model. Every model has a requests-per-minute ceiling that matches the upstream provider’s documented limit, scaled down conservatively. This protects the shared channels that route traffic. Per group (admin-configured). Your admin can carve users into groups with different model access and different per-minute ceilings. The lowest of the three applies.Response
When you’re rate-limited:Retry-After header is in seconds. Retry after that delay —
immediately retrying will fail again.
Recommended client behavior
- On
429, readRetry-After. - Wait that many seconds.
- Retry the same request.
- If a second
429occurs, increase the wait by 2× (exponential backoff) up to 60 seconds. - If you see
429repeatedly across models, consider splitting traffic acrossextra_body.modelsto multiple providers — see Model Fallbacks.
Retry-After automatically
by default. You don’t need custom code unless you’ve disabled retries.
Not yet exposed
Response headers showing remaining budget (X-RateLimit-Remaining,
X-RateLimit-Reset) are on the roadmap but not yet implemented. For now,
429 is a reactive signal, not a predictive one.