How to use
Put an ordered list of model IDs inextra_body.models and set
extra_body.route to "fallback". The primary model field still
matters — it’s the first attempt — but OrcaRouter ignores it in favor of
the chain if the chain is present.
Rules
- Maximum 5 models in the chain. Extras are silently truncated.
- Recommended: all models in a chain should be the same endpoint type
(all chat, or all image). Mixing a chat model with an image model won’t
crash the gateway, but the fallback that actually serves the request
needs to match the endpoint you called (e.g. if you call
/v1/chat/completions, only chat models in the chain are usable). - Fallback behavior:
- Unresolvable
orcarouter/{name}entries (bad name, disabled router) are silently skipped. - Models the calling key cannot access (model-allowlist mismatch) are silently skipped.
- When the primary model fails upstream (5xx / 429 / network error), the next chain entry is tried.
- The request fails only when every chain entry has been exhausted.
- Streaming caveat: once any byte of the response has been sent to the client, fallback can no longer kick in — if the upstream drops mid-stream, the client sees a truncated stream, not a transparent retry on the next model.
- Unresolvable
- Billing happens for the model that actually served the response, at that model’s rate — not the primary’s.
extra_body.routemust be exactly"fallback"for the chain to activate. Any other value (or missing) → the chain is ignored and only the top-levelmodelis used.
How to tell which model served the response
Check theX-Orca-Fallback-Level and X-Orca-Fallback-Model response
headers. See Response Headers.
When not to use this
If you want OrcaRouter to automatically pick the cheapest available model without writing a chain, useorcarouter/auto
instead. Fallback chains are for cases where you want explicit control
over the ordering.