Model Fallbacks

OrcaRouter can try multiple models in order until one succeeds. Useful for resilience (if one provider is throttling or down) and for cost control (prefer a cheaper model, fall back to a stronger one if needed).

How to use

Put an ordered list of model IDs in extra_body.models and set extra_body.route to "fallback". The primary model field still matters — it’s the first attempt — but OrcaRouter ignores it in favor of the chain if the chain is present.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "models": ["gpt-4o", "claude-3-5-sonnet-latest", "gemini-2.5-pro"],
        "route": "fallback",
    },
)

Rules

Maximum 5 models in the chain.
Models must be the same kind — a chat fallback chain can’t drop into an image-generation model. An image fallback chain must be all image models. We don’t translate across endpoint types.
If a model in the chain doesn’t resolve (wrong name, no channel), we silently skip it and try the next one. The request only fails if all chain entries fail.
Billing happens for the model that actually served the response, at that model’s rate.

How to tell which model served the response

Check the X-Orca-Fallback-Level and X-Orca-Fallback-Model response headers. See Response Headers.

response = client.chat.completions.with_raw_response.create(...)
served_by = response.headers.get("X-Orca-Fallback-Model", "primary")
# "primary" means level 0; otherwise the fallback model name

When not to use this

If you want OrcaRouter to automatically pick the cheapest available model without writing a chain, use orcarouter/auto instead. Fallback chains are for cases where you want explicit control over the ordering.

Getting Started

Smart Routing

Compatibility

Reference

Privacy & Security

How to use

Rules

How to tell which model served the response

When not to use this

Getting Started

Smart Routing

Compatibility

Reference

Privacy & Security

​How to use

​Rules

​How to tell which model served the response

​When not to use this

How to use

Rules

How to tell which model served the response

When not to use this