Skip to main content
OrcaRouter lets you save a routing strategy as a named router. Call it from your code as orcarouter/{name} and OrcaRouter resolves it to a concrete model at request time, based on the rules you configured. This is useful when you want to:
  • Swap routing behavior without redeploying your app (change the router in the dashboard; your code stays the same).
  • Let different teams or services choose their own routing policy independently of the application that calls the API.
  • Reference routing logic that’s too complex to inline in extra_body.

Using a router

response = client.chat.completions.create(
    model="orcarouter/production-chat",
    messages=[...],
)
To find out which concrete model a router resolved to, read the X-Orca-Router and X-Orca-Resolved-Model response headers — see Response Headers. The model field in the response body itself reflects whatever the upstream returned (often the bare upstream name, e.g. gpt-4o-mini-2024-07-18).

Creating a router

Routers are created in the dashboard under Routing. Each router has:
  • Name — the {name} in orcarouter/{name}. Must be unique within your workspace; lowercase letters, digits, _, and - (1-50 chars). The name orcarouter is reserved.
  • Allowed models — one or more glob patterns (comma- or newline-separated, case-insensitive) limiting which models this router can pick. Examples: openai/* or openai/*, anthropic/claude-haiku-*. Empty matches every model your account has access to.
  • Strategy — how to pick among matching models. See Strategies below.
  • Mundane models / Hard models — additional model lists used only by the Adaptive · Gated strategy. See Adaptive below.
  • Default model — a safety-net model used if the pattern resolves to nothing.
  • Enabled — disable the router without deleting it.

Strategies

The editor exposes four strategy cards. Adaptive bundles two backend sub-modes, for five total enum values you can persist via the API.

Cheapest

Picks the model with the lowest per-token price among live candidates. Default for the seeded orcarouter/auto router. Best when you want the cheapest live chat model on every request and don’t care about output-style consistency across calls.

Quality

Picks the model with the highest quality score among live candidates, regardless of price. Best when output quality dominates cost.

Balanced

Picks a low-cost option that still meets a quality bar; if nothing meets the bar, falls back to the highest-quality option. Default for new routers you create yourself. Runs without per-router tuning.

Adaptive

A per-router LinUCB contextual bandit that learns from your real production traffic. Weighs quality, cost, latency, and reliability per request to pick the best model. New routers behave like Balanced during a short cold-start period (a per-model warm-up) before the bandit starts steering picks — that’s expected, not a bug. Two sub-modes:
  • Standard (API enum: linucb) — considers every Allowed model for each request. Best when traffic is roughly uniform and you want the router to find the best option across your full list.
  • Gated (API enum: gated_adaptive) — requests are first classified as mundane or hard; mundane requests draw from a smaller Mundane models pool, hard requests from a stronger Hard models pool, and mid-difficulty requests from the full Allowed list. Best when your traffic mixes simple and complex calls. Each pool is intersected with Allowed models; empty or non-overlapping pools quietly fall back to the full Allowed list, so requests are never starved. Configure the two pools (weak_pool and strong_pool at the API level — up to 2000 chars each) in the editor when you pick Gated.

Seeded router: orcarouter/auto

Every OrcaRouter account is seeded with a default router called auto on signup — see Auto Router. You can use it immediately without any configuration.