Skip to main content
OrcaRouter applies rate limits at three levels. When any one of them fires, you get an HTTP 429 Too Many Requests response with a Retry-After header.

Limit layers

Per API key. Default is set by your plan. Protects against runaway keys (e.g. a leaked key attempting a DDoS). Per model. Every model has a requests-per-minute ceiling that matches the upstream provider’s documented limit, scaled down conservatively. This protects the shared channels that route traffic. Per group (admin-configured). Your admin can carve users into groups with different model access and different per-minute ceilings. The lowest of the three applies.

Response

When you’re rate-limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 5
Content-Type: application/json

{
  "error": {
    "message": "Rate limit exceeded. Try again in 5 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
The Retry-After header is in seconds. Retry after that delay — immediately retrying will fail again.
  1. On 429, read Retry-After.
  2. Wait that many seconds.
  3. Retry the same request.
  4. If a second 429 occurs, increase the wait by 2× (exponential backoff) up to 60 seconds.
  5. If you see 429 repeatedly across models, consider splitting traffic across extra_body.models to multiple providers — see Model Fallbacks.
The OpenAI Python and TypeScript SDKs handle Retry-After automatically by default. You don’t need custom code unless you’ve disabled retries.

Not yet exposed

Response headers showing remaining budget (X-RateLimit-Remaining, X-RateLimit-Reset) are on the roadmap but not yet implemented. For now, 429 is a reactive signal, not a predictive one.