Skip to main content
OrcaRouter can try multiple models in order until one succeeds. Useful for resilience (if one provider is throttling or down) and for cost control (prefer a cheaper model, fall back to a stronger one if needed).

How to use

Put an ordered list of model IDs in extra_body.models and set extra_body.route to "fallback". The primary model field still matters — it’s the first attempt — but OrcaRouter ignores it in favor of the chain if the chain is present.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "models": ["gpt-4o", "claude-3-5-sonnet-latest", "gemini-2.5-pro"],
        "route": "fallback",
    },
)

Rules

  • Maximum 5 models in the chain.
  • Models must be the same kind — a chat fallback chain can’t drop into an image-generation model. An image fallback chain must be all image models. We don’t translate across endpoint types.
  • If a model in the chain doesn’t resolve (wrong name, no channel), we silently skip it and try the next one. The request only fails if all chain entries fail.
  • Billing happens for the model that actually served the response, at that model’s rate.

How to tell which model served the response

Check the X-Orca-Fallback-Level and X-Orca-Fallback-Model response headers. See Response Headers.
response = client.chat.completions.with_raw_response.create(...)
served_by = response.headers.get("X-Orca-Fallback-Model", "primary")
# "primary" means level 0; otherwise the fallback model name

When not to use this

If you want OrcaRouter to automatically pick the cheapest available model without writing a chain, use orcarouter/auto instead. Fallback chains are for cases where you want explicit control over the ordering.