Skip to main content
The built-in strategies — cheapest, quality, balanced, adaptive — pick a model by price and quality. The Routing DSL is the tier below that, for when the right model depends on what the request actually is: a long agentic coding turn, a cheap classification call, a vision request, a retry after a failed test. You write rules; the gateway evaluates them per request and routes accordingly. It’s the dsl strategy of a named router — so your application keeps calling orcarouter/{name} and the routing logic lives in the dashboard, versioned and editable without a redeploy.

When to reach for the DSL

Use a built-in strategy when “cheapest live model” or “best quality” captures your intent. Reach for the DSL when routing depends on the content or context of the request:
  • Task specialization — send code to a coding model, vision to a vision model, cheap chat to a cheap model.
  • Difficulty-aware routing — escalate only the hard requests to an expensive model; keep the easy ones cheap.
  • Agent-aware routing — route differently based on session state (which tools the agent has used, whether tests just failed, how many turns in it is).
  • Time / tenant / header rules — different routing by hour, user group, or a request header.

Enabling it

In the dashboard under Routing, open a router and set its Strategy to DSL. That reveals the DSL editor for this router. Everything else about the router still applies — the Allowed models glob, the Default model safety net, and the orcarouter/{name} invocation.

The editor

The editor is built to get you from intent to a working ruleset quickly:
  • Templates seeded with your workspace’s real models (via a one-time tier-mapping dialog), so you never start from a blank file or hit an “unknown model” wall.
  • Insert — drop in a Model, a Router (orcarouter/<name>), or a Pool from autocomplete instead of typing identifiers by hand.
  • Generate — describe the routing you want in plain language and get back compiled, lint-clean DSL grounded in your real models.
  • Explain — a plain-English paraphrase of what the current ruleset does.
  • Inline lint — every error reports {line, column, message} and every lint code has a ? explainer. Precedence (first-match-wins) and the common CEL patterns are surfaced in-place.

File structure

A ruleset is YAML with three top-level keys:
version: 1              # required — currently must be 1
rules: [...]            # required — 1 to 30 rules, evaluated in order
default: {...}          # required — the effect when no rule matches
A rule is a when: condition and a use: effect:
rules:
  - id: hard_code              # required: ^[a-z][a-z0-9_]{0,39}$, unique
    when: |                    # optional CEL boolean; absent ⇒ always matches
      task_class == "code" && difficulty > 0.6
    use:
      model: "anthropic/claude-sonnet-4-6"
default:
  delegate: balanced           # fall back to a built-in strategy
Rules are evaluated top to bottom; the first rule whose when: is true wins. If none match, default: applies. Order your rules most-specific first — a broad early rule shadows everything below it.

when: — the condition

Conditions are written in CEL (Common Expression Language): safe by design — no loops, no I/O, microsecond evaluation, RE2 regex only. These six patterns cover the vast majority of real rules:
PatternExample
Field accesstask_class == "agent"
Numeric comparedifficulty > 0.6 && request.input_tokens < 50000
Boolean logicagent_state.has_edited && !agent_state.has_run_tests
List membership"Edit" in agent_state.tools_used
Regex macrosystem_prompt_matches("(?i)planning agent")
Tool macrotool_calls_present_any(["Edit","Write","apply_patch"])

Variables

Request shape
VariableType
modelstring
request.input_tokensint
request.output_max_tokensint
request.streambool
request.visionbool
request.message_countint
request.has_system_promptbool
request.has_toolsbool
Classification (computed by the gateway per request)
VariableTypeMeaning
task_classstringchat / code / agent / vision / audio / rag / creative
difficultydouble0.01.0
code_keyword_densitydouble0.01.0
reasoning_cue_countintreasoning cues detected in the prompt
tool_countintdistinct tool definitions on the request
Agent session (agent_state.*, persisted across a conversation)
VariableType
agent_state.turnint
agent_state.tools_usedlist<string>
agent_state.files_readlist<string>
agent_state.has_editedbool
agent_state.has_run_testsbool
agent_state.last_test_failedbool
agent_state.consecutive_errorsint
agent_state.elapsed_secondsint
agent_state.models_triedlist<string>
Context
VariableType
headers["x-foo"]string
user.id / user.groupint / string
token.id / token.nameint / string
time.hour / time.weekdayint (UTC)
workspace.idint

Macros

Registered CEL functions for the common “look inside the request” checks:
MacroReturns
system_prompt_matches(regex)RE2 over the joined system messages
user_message_matches(regex)RE2 over the last user message
tool_definitions_include(name)a tool is declared on the request
tool_calls_present_any(list)the request carries any of these tool calls
tool_results_from_any(list)the request has tool-role messages from any
header_matches(name, regex)RE2 over a header value

use: — the effect

A use: block names a destination (exactly one) and any number of optional per-call knobs.

Destination

use:
  model:    "anthropic/claude-sonnet-4-6"   # one upstream model
  models:   ["openai/gpt-4o-mini", "..."]   # load-balance across a list
  pool:     "@pool:<name>"                   # an admin-curated pool
  delegate: balanced                         # hand off to a built-in
                                             #   strategy: cheapest |
                                             #   quality | balanced |
                                             #   linucb | gated_adaptive
delegate: dsl is rejected (it would recurse). Pinning to specific channels (channels: / @channel:) is not currently available and lints as unsupported — route by model, models, or pool instead.

Per-call knobs

Combine with any destination to shape the upstream call:
use:
  reasoning_effort:       low | medium | high     # OpenAI o-series, Gemini
  thinking_budget_tokens: 1024..64000             # Claude / Gemini thinking
  samples:                1..16                    # the n parameter
  temperature:            0.0..2.0
  param_override:         { ... }                  # merged into upstream params
  header_override:        { ... }                  # merged into upstream headers
  reason_tag:             "<[a-z0-9_]+>"           # shows up in logs/telemetry
  affinity_ttl:           "5m"                      # channel stickiness window
  model_rewrite:          "<upstream-model>"       # send under a different name
param_override and header_override enforce a denylist — you can’t override model, messages, stream, tools, auth headers, etc. (those would subvert billing, audit, or agent state).

Confidence cascades & ensembles (advanced)

Two advanced effects let a rule react to a weak first answer or fan out across several models. They’re authored the same way as any rule. Cascade — retry on a low-confidence signal with a stronger effect:
rules:
  - id: code_with_repair
    when: task_class == "code"
    use:
      model: "openai/gpt-4o-mini"
    on_low_confidence:
      signals: [patch_invalid, self_doubt, next_turn_test_failed]
      use:
        model: "anthropic/claude-sonnet-4-6"   # repair attempt
Ensemble — issue several legs in parallel and let an arbiter pick:
use:
  parallel:
    - { model: "anthropic/claude-sonnet-4-6" }
    - { model: "openai/gpt-4o-mini", samples: 2 }
  arbiter:
    strategy: best_of_n        # or majority | first | tests_pass
    model:    "anthropic/claude-sonnet-4-6"   # judge (best_of_n only)
  max_latency_ms: 120000
The ensemble / cascade runtime is gated and off by default. Because each parallel leg and each cascade repair bills as its own call, the fan-out runtime is behind a server flag while per-leg billing is validated. With it off, a parallel: rule serves the first leg only and a cascade records its signal but doesn’t re-dispatch — the ruleset still lints, saves, and routes its primary effect normally. Contact us to enable the ensemble runtime for your workspace.

Rolling out safely

A new ruleset doesn’t take over your traffic the moment you save it:
  • Shadow mode — for a window after the first save, the DSL is evaluated but not used: your previous strategy still serves traffic while the gateway records what the DSL would have done. The dashboard shows a diff report — percentage of differing routes, projected cost delta, per-rule fire counts, and how often it fell through to default:. Read it before you trust the rules.
  • Canary — ramp the DSL onto a percentage of live traffic (5 → 25 → 50 → 100), watching per-slice metrics, and roll back instantly by sliding the percentage to 0.
You can also dry-run a ruleset against a synthetic request (task class, difficulty, agent state, request shape) right in the editor and see the trace and matched rule — no traffic, nothing persisted.

Limits & validation

Every save runs a strict lint; invalid rulesets are rejected with {line, column, message, rule}:
  • Schema — required keys, correct types/enums, no unknown fields.
  • Size — ≤ 30 rules, ≤ 16 KiB of YAML, ≤ 200 chars per when:.
  • CEL — parses, type-checks against the variable environment, no unknown identifiers, and when: must evaluate to a bool.
  • Effect — exactly one destination per use: block; all model / models / @pool: references must resolve in your workspace.
  • Knob rangesthinking_budget_tokens ∈ [1024, 64000], temperature ∈ [0, 2], samples ∈ [1, 16].
  • Reserved — rule ids starting with _ are reserved; default as a rule id is rejected (use the top-level default: block).
Every save and rollback writes an audit row; concurrent edits are detected and the second save is asked to retry against fresh state.

A complete example

version: 1
rules:
  - id: vision
    when: request.vision
    use: { model: "openai/gpt-4o" }

  - id: cheap_chat
    when: task_class == "chat" && difficulty < 0.3
    use: { delegate: cheapest }

  - id: hard_code
    when: task_class == "code" && difficulty > 0.6
    use:
      model: "anthropic/claude-sonnet-4-6"
      thinking_budget_tokens: 8000
      reason_tag: hard_code

  - id: agent_after_failed_test
    when: agent_state.last_test_failed && agent_state.consecutive_errors >= 2
    use:
      model: "anthropic/claude-sonnet-4-6"
      reason_tag: repair

default:
  delegate: balanced
To confirm which model a request resolved to, read the X-Orca-Router and X-Orca-Resolved-Model response headers.

API reference

The DSL is managed per router; writes require Developer+.
Method & pathRolePurpose
GET /api/user/routers/:id/dslMemberSource + version + shadow/canary state.
PUT /api/user/routers/:id/dslDeveloper+Lint + save (new version, audited).
POST /api/user/routers/:id/dsl/lintMemberLint a draft → {errors:[…]}.
POST /api/user/routers/dsl/lintMemberStateless lint (no router id).
POST /api/user/routers/:id/dsl/dryrunMemberEvaluate a synthetic request → trace + matched rule.
GET /api/user/routers/:id/dsl/historyMemberVersion history, newest first.
POST /api/user/routers/:id/dsl/rollback/:versionDeveloper+Re-lint and restore an older version.

FAQ

It is a strategy — the dsl option alongside cheapest / quality / balanced / adaptive. The others pick by price and quality; the DSL picks by rules you write over the request’s shape, classification, and agent state. You can still delegate: to a built-in strategy as a rule’s effect or as the default.
The top-level default: effect applies. It’s required, so there’s always a defined outcome — commonly delegate: balanced or a specific safety-net model.
Yes. CEL runs in a sandbox with standard-library functions only, a few-millisecond evaluation deadline, RE2 regex (linear-time, no ReDoS), and no access to the database, network, or filesystem. The variable environment is a fixed set of scalars and lists.
Three ways: dry-run it against a synthetic request in the editor, leave it in shadow mode and read the diff report, then canary it onto a small percentage of live traffic before ramping to 100%.