Routing DSL - OrcaRouter

The built-in strategies — cheapest, quality, balanced, adaptive — pick a model by price and quality. The Routing DSL is the tier below that, for when the right model depends on what the request actually is: a long agentic coding turn, a cheap classification call, a vision request, a retry after a failed test. You write rules; the gateway evaluates them per request and routes accordingly. It’s the dsl strategy of a named router — so your application keeps calling orcarouter/{name} and the routing logic lives in the dashboard, versioned and editable without a redeploy.

When to reach for the DSL

Use a built-in strategy when “cheapest live model” or “best quality” captures your intent. Reach for the DSL when routing depends on the content or context of the request:

Task specialization — send code to a coding model, vision to a vision model, cheap chat to a cheap model.
Difficulty-aware routing — escalate only the hard requests to an expensive model; keep the easy ones cheap.
Agent-aware routing — route differently based on session state (which tools the agent has used, whether tests just failed, how many turns in it is).
Time / tenant / header rules — different routing by hour, user group, or a request header.

Enabling it

In the dashboard under Routing, open a router and set its Strategy to DSL. That reveals the DSL editor for this router. Everything else about the router still applies — the Allowed models glob, the Default model safety net, and the orcarouter/{name} invocation.

The editor

The editor is built to get you from intent to a working ruleset quickly:

Templates seeded with your workspace’s real models (via a one-time tier-mapping dialog), so you never start from a blank file or hit an “unknown model” wall.
Insert — drop in a Model, a Router (orcarouter/<name>), or a Pool from autocomplete instead of typing identifiers by hand.
Generate — describe the routing you want in plain language and get back compiled, lint-clean DSL grounded in your real models.
Explain — a plain-English paraphrase of what the current ruleset does.
Inline lint — every error reports {line, column, message} and every lint code has a ? explainer. Precedence (first-match-wins) and the common CEL patterns are surfaced in-place.

File structure

A ruleset is YAML with three top-level keys:

version: 1              # required — currently must be 1
rules: [...]            # required — 1 to 30 rules, evaluated in order
default: {...}          # required — the effect when no rule matches

A rule is a when: condition and a use: effect:

rules:
  - id: hard_code              # required: ^[a-z][a-z0-9_]{0,39}$, unique
    when: |                    # optional CEL boolean; absent ⇒ always matches
      task_class == "code" && difficulty > 0.6
    use:
      model: "anthropic/claude-sonnet-4-6"
default:
  delegate: balanced           # fall back to a built-in strategy

Rules are evaluated top to bottom; the first rule whose when: is true wins. If none match, default: applies. Order your rules most-specific first — a broad early rule shadows everything below it.

`when:` — the condition

Conditions are written in CEL (Common Expression Language): safe by design — no loops, no I/O, microsecond evaluation, RE2 regex only. These six patterns cover the vast majority of real rules:

Pattern	Example
Field access	`task_class == "agent"`
Numeric compare	`difficulty > 0.6 && request.input_tokens < 50000`
Boolean logic	`agent_state.has_edited && !agent_state.has_run_tests`
List membership	`"Edit" in agent_state.tools_used`
Regex macro	`system_prompt_matches("(?i)planning agent")`
Tool macro	`tool_calls_present_any(["Edit","Write","apply_patch"])`

Variables

Request shape

Variable	Type
`model`	string
`request.input_tokens`	int
`request.output_max_tokens`	int
`request.stream`	bool
`request.vision`	bool
`request.message_count`	int
`request.has_system_prompt`	bool
`request.has_tools`	bool

Classification (computed by the gateway per request)

Variable	Type	Meaning
`task_class`	string	`chat` / `code` / `agent` / `vision` / `audio` / `rag` / `creative`
`difficulty`	double	`0.0`–`1.0`
`code_keyword_density`	double	`0.0`–`1.0`
`reasoning_cue_count`	int	reasoning cues detected in the prompt
`tool_count`	int	distinct tool definitions on the request

Agent session (agent_state.*, persisted across a conversation)

Variable	Type
`agent_state.turn`	int
`agent_state.tools_used`	list<string>
`agent_state.files_read`	list<string>
`agent_state.has_edited`	bool
`agent_state.has_run_tests`	bool
`agent_state.last_test_failed`	bool
`agent_state.consecutive_errors`	int
`agent_state.elapsed_seconds`	int
`agent_state.models_tried`	list<string>

Context

Variable	Type
`headers["x-foo"]`	string
`user.id` / `user.group`	int / string
`token.id` / `token.name`	int / string
`time.hour` / `time.weekday`	int (UTC)
`workspace.id`	int

Macros

Registered CEL functions for the common “look inside the request” checks:

Macro	Returns
`system_prompt_matches(regex)`	RE2 over the joined system messages
`user_message_matches(regex)`	RE2 over the last user message
`tool_definitions_include(name)`	a tool is declared on the request
`tool_calls_present_any(list)`	the request carries any of these tool calls
`tool_results_from_any(list)`	the request has tool-role messages from any
`header_matches(name, regex)`	RE2 over a header value

`use:` — the effect

A use: block names a destination (exactly one) and any number of optional per-call knobs.

Destination

use:
  model:    "anthropic/claude-sonnet-4-6"   # one upstream model
  models:   ["openai/gpt-4o-mini", "..."]   # load-balance across a list
  pool:     "@pool:<name>"                   # an admin-curated pool
  delegate: balanced                         # hand off to a built-in
                                             #   strategy: cheapest |
                                             #   quality | balanced |
                                             #   linucb | gated_adaptive

delegate: dsl is rejected (it would recurse). Pinning to specific channels (channels: / @channel:) is not currently available and lints as unsupported — route by model, models, or pool instead.

Per-call knobs

Combine with any destination to shape the upstream call:

use:
  reasoning_effort:       low | medium | high     # OpenAI o-series, Gemini
  thinking_budget_tokens: 1024..64000             # Claude / Gemini thinking
  samples:                1..16                    # the n parameter
  temperature:            0.0..2.0
  param_override:         { ... }                  # merged into upstream params
  header_override:        { ... }                  # merged into upstream headers
  reason_tag:             "<[a-z0-9_]+>"           # shows up in logs/telemetry
  affinity_ttl:           "5m"                      # channel stickiness window
  model_rewrite:          "<upstream-model>"       # send under a different name

param_override and header_override enforce a denylist — you can’t override model, messages, stream, tools, auth headers, etc. (those would subvert billing, audit, or agent state).

Confidence cascades & ensembles (advanced)

Two advanced effects let a rule react to a weak first answer or fan out across several models. They’re authored the same way as any rule. Cascade — retry on a low-confidence signal with a stronger effect:

rules:
  - id: code_with_repair
    when: task_class == "code"
    use:
      model: "openai/gpt-4o-mini"
    on_low_confidence:
      signals: [patch_invalid, self_doubt, next_turn_test_failed]
      use:
        model: "anthropic/claude-sonnet-4-6"   # repair attempt

Ensemble — issue several legs in parallel and let an arbiter pick:

use:
  parallel:
    - { model: "anthropic/claude-sonnet-4-6" }
    - { model: "openai/gpt-4o-mini", samples: 2 }
  arbiter:
    strategy: best_of_n        # or majority | first | tests_pass
    model:    "anthropic/claude-sonnet-4-6"   # judge (best_of_n only)
  max_latency_ms: 120000

The ensemble / cascade runtime is gated and off by default. Because each parallel leg and each cascade repair bills as its own call, the fan-out runtime is behind a server flag while per-leg billing is validated. With it off, a parallel: rule serves the first leg only and a cascade records its signal but doesn’t re-dispatch — the ruleset still lints, saves, and routes its primary effect normally. Contact us to enable the ensemble runtime for your workspace.

Rolling out safely

A new ruleset doesn’t take over your traffic the moment you save it:

Shadow mode — for a window after the first save, the DSL is evaluated but not used: your previous strategy still serves traffic while the gateway records what the DSL would have done. The dashboard shows a diff report — percentage of differing routes, projected cost delta, per-rule fire counts, and how often it fell through to default:. Read it before you trust the rules.
Canary — ramp the DSL onto a percentage of live traffic (5 → 25 → 50 → 100), watching per-slice metrics, and roll back instantly by sliding the percentage to 0.

You can also dry-run a ruleset against a synthetic request (task class, difficulty, agent state, request shape) right in the editor and see the trace and matched rule — no traffic, nothing persisted.

Limits & validation

Every save runs a strict lint; invalid rulesets are rejected with {line, column, message, rule}:

Schema — required keys, correct types/enums, no unknown fields.
Size — ≤ 30 rules, ≤ 16 KiB of YAML, ≤ 200 chars per when:.
CEL — parses, type-checks against the variable environment, no unknown identifiers, and when: must evaluate to a bool.
Effect — exactly one destination per use: block; all model / models / @pool: references must resolve in your workspace.
Knob ranges — thinking_budget_tokens ∈ [1024, 64000], temperature ∈ [0, 2], samples ∈ [1, 16].
Reserved — rule ids starting with _ are reserved; default as a rule id is rejected (use the top-level default: block).

Every save and rollback writes an audit row; concurrent edits are detected and the second save is asked to retry against fresh state.

A complete example

version: 1
rules:
  - id: vision
    when: request.vision
    use: { model: "openai/gpt-4o" }

  - id: cheap_chat
    when: task_class == "chat" && difficulty < 0.3
    use: { delegate: cheapest }

  - id: hard_code
    when: task_class == "code" && difficulty > 0.6
    use:
      model: "anthropic/claude-sonnet-4-6"
      thinking_budget_tokens: 8000
      reason_tag: hard_code

  - id: agent_after_failed_test
    when: agent_state.last_test_failed && agent_state.consecutive_errors >= 2
    use:
      model: "anthropic/claude-sonnet-4-6"
      reason_tag: repair

default:
  delegate: balanced

To confirm which model a request resolved to, read the X-Orca-Router and X-Orca-Resolved-Model response headers.

API reference

The DSL is managed per router; writes require Developer+.

Method & path	Role	Purpose
`GET /api/user/routers/:id/dsl`	Member	Source + version + shadow/canary state.
`PUT /api/user/routers/:id/dsl`	Developer+	Lint + save (new version, audited).
`POST /api/user/routers/:id/dsl/lint`	Member	Lint a draft → `{errors:[…]}`.
`POST /api/user/routers/dsl/lint`	Member	Stateless lint (no router id).
`POST /api/user/routers/:id/dsl/dryrun`	Member	Evaluate a synthetic request → trace + matched rule.
`GET /api/user/routers/:id/dsl/history`	Member	Version history, newest first.
`POST /api/user/routers/:id/dsl/rollback/:version`	Developer+	Re-lint and restore an older version.

FAQ

How is this different from a named router's strategy?

It is a strategy — the dsl option alongside cheapest / quality / balanced / adaptive. The others pick by price and quality; the DSL picks by rules you write over the request’s shape, classification, and agent state. You can still delegate: to a built-in strategy as a rule’s effect or as the default.

What happens if no rule matches?

The top-level default: effect applies. It’s required, so there’s always a defined outcome — commonly delegate: balanced or a specific safety-net model.

Is it safe to run untrusted CEL on the hot path?

Yes. CEL runs in a sandbox with standard-library functions only, a few-millisecond evaluation deadline, RE2 regex (linear-time, no ReDoS), and no access to the database, network, or filesystem. The variable environment is a fixed set of scalars and lists.

Can I test a ruleset before it touches real traffic?

Three ways: dry-run it against a synthetic request in the editor, leave it in shadow mode and read the diff report, then canary it onto a small percentage of live traffic before ramping to 100%.

​When to reach for the DSL

​Enabling it

​The editor

​File structure

​when: — the condition

​Variables

​Macros

​use: — the effect

​Destination

​Per-call knobs

​Confidence cascades & ensembles (advanced)

​Rolling out safely

​Limits & validation

​A complete example

​API reference

​FAQ

When to reach for the DSL

Enabling it

The editor

File structure

`when:` — the condition

Variables

Macros

`use:` — the effect

Destination

Per-call knobs

Confidence cascades & ensembles (advanced)

Rolling out safely

Limits & validation

A complete example

API reference

FAQ