429 Too Many Requests
response with a Retry-After header.
Why workspace-scoped
Workspaces are how OrcaRouter groups the keys, members, and billing that belong to a single team or individual. Shared limits inside a workspace make traffic predictable as your team grows: adding a new key (or a new member) doesn’t multiply your shared budget. If you need a higher ceiling, upgrade the workspace’s plan. OrcaRouter does not expose per-model rate limits to callers — the gateway behaves like a single logical provider from your application’s view, consistent with provider opacity. Internal throttling toward upstream providers happens transparently and is not part of the public contract.Response
A rate-limited request always returns:Retry-After.
When a body is present it follows the OpenAI-compatible envelope with
error.type set to orcarouter_api_error. The error.message may be
localized (currently Chinese) — see
Errors for the envelope structure.
Retry-After is in seconds. It’s the rate-limit window duration
(conservative — safe to wait exactly that long); the next window
will have full budget. Immediately retrying without waiting will
fail again.
Recommended client behavior
- On
429, readRetry-After. - Wait that many seconds.
- Retry the same request.
- If a second
429occurs, increase the wait by 2× (exponential backoff) up to 60 seconds. - If you see
429repeatedly, consider splitting traffic across multiple models withextra_body.models— see Model Fallbacks.
Retry-After
automatically by default. You don’t need custom code unless you’ve
disabled retries.
Reactive, not predictive
OrcaRouter does not returnX-RateLimit-Remaining / X-RateLimit-Reset
headers, so you can’t pre-emptively check how much budget is left.
Treat 429 as the signal — back off when you see it, then resume.