Input-stage guardrails

An input-stage guardrail screens the caller’s request before it reaches the model. It is the cheapest place to enforce a content policy: the gateway inspects the prompt on its way in, and if a rule blocks, the request is rejected before metering — you pay nothing for the call. This is where you stop a leaked secret, a PII field, or an injection attempt from ever hitting the upstream model. For the full engine — every rule type, field, and route — see the Guardrails reference. This page is the focused take on the input stage: what runs before the model, and why a block here costs no quota.

1. Input guardrails for LLM apps, before the model

Every guardrail rule carries a stage — input, output, or both. An input rule runs against the request text the moment it arrives, on the way to the upstream model:

caller → [ input guardrail ] → metering → model → [ output guardrail ] → caller

That ordering is the point. An input rule sees the prompt before the gateway pre-consumes any quota, so a block at this stage is free — the request never reaches the model and never bills. Compare the output stage, which screens the model’s response after it comes back (an output block refunds the pre-consumed quota instead).

Input rules screen the caller’s request. If you also use registry prompts, the injected system message is added later in routing — so input rules see the messages your app sent, not the injected prompt. Output rules screen the response either way.

2. What you can run at the input stage

Any rule type can run at input. The most common reasons to gate the request before the model:

Mask PII in the prompt

A pii rule with the mask action rewrites entities to typed tags (jane@acme.com → [EMAIL]) so the upstream model never sees the raw value. See PII Shield.

Block secrets before they leak

A request carrying an API key or cloud credential is rejected at the door — pre-metering, no upstream call. See Block secrets.

Stop injection attempts

The Prompt-Injection basics preset pairs keyword/regex detectors with an llm_judge rule for injection intent. See Prompt injection.

Cap prompt size

A max_chars rule rejects an oversized prompt before it bills any tokens. See Cost guardrails.

The seven rule types — keyword, regex, pii, max_chars, external, llm_judge, grounding — and the five actions block, mask, flag, annotate, and spotlight all apply here. (spotlight wraps matched untrusted text in delimiters so the model treats it as data, not instructions — an input-stage prompt-injection defense; annotate attaches a note without changing the traffic.) One exception worth knowing: grounding measures the answer against retrieved sources, so it is inherently an output-stage check. Everything else is a natural fit for the input stage.

Input-stage masking is live today — streaming or not. The gateway rewrites the request before the model sees it on every path. Output mask redacts non-streaming responses only; in-stream output rewriting is on the roadmap, so a mask rule does not redact a streaming reply yet. Output block, by contrast, is enforced both ways — streaming and non-streaming (see Streaming coverage).

3. One concrete example

Author the rule in the console (under your own session — guardrail config needs Developer+), not with a relay key. Add a single input rule to a guardrail named secrets-shield:

{
  "type": "regex",
  "stage": "input",
  "action": "block",
  "pattern": "sk-[A-Za-z0-9]{20,}"
}

Attach the guardrail to a key (set guardrail_id, or mark it the workspace default — see Attach to a key), then call the gateway with that sk-orca-... relay key:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Debug this: OPENAI_API_KEY=sk-abcdefghij1234567890"}
    ]
  }'

The request matches at the input stage and is rejected with HTTP 400 guardrail_blocked before the gateway forwards anything upstream:

{
  "error": {
    "type": "guardrail_blocked",
    "message": "request blocked by guardrail \"secrets-shield\": regex(...)"
  }
}

See the guardrail_blocked error for the full response shape.

4. Why an input block costs no quota

This is the structural advantage of catching things on the way in. An input-stage block sits before pre-consume, so:

Property	Input-stage block
HTTP status	`400 guardrail_blocked`
Quota charged	None — fires before metering
Upstream call	Never made
Retry	Marked skip-retry — re-running blocks again

Because the request never reaches a channel, an input block is marked skip-retry: re-running the same prompt against another channel would just block again and waste effort. The output stage differs — a block there refunds the quota the gateway already pre-consumed. Same 400, different accounting.

5. Resolution and fallback

An input-stage rule only runs if a guardrail actually resolves on the request. Resolution is explicit:

The key’s explicit guardrail_id, if it exists and is enabled.
Otherwise the workspace default guardrail.
Otherwise none — the request is byte-identical to a workspace with no policy.

An explicit attachment never silently falls back. Disabling an attached guardrail is the off switch — it does not drop through to the workspace default. (Firewall policies behave differently here; see Guardrails vs. firewall.)

6. Prove it before you ship

Don’t attach a blocking input rule to live traffic on faith. Two ways to validate first:

Test tab — one sample

Open the Test tab in the guardrail editor, paste a sample, pick the input stage, and run. The sandbox evaluates the current policy locally — no upstream call, no quota — and returns the verdict plus (for mask rules) the rendered text. See Testing & eval.

Flag before you block

Set the action to flag first. A flag changes nothing about the traffic — it only records a match — so you can measure how often a rule would fire on real input before you flip it to block. See Tune false positives.

See what fired

Every rule that fires records a match — type, action, stage, and a detail string. The matched substring is recorded only when Log raw content is on (off by default). See the Matches feed and Logging & privacy.

7. Where to go next

The input stage stops bad input from reaching the model. To gate the model’s response, pair it with the output stage; to govern an agent’s tool calls, use the firewall.

Output-stage rules — screen the model’s response after it comes back.
Stages and both — when to run a rule on input, output, or both.
Securing AI agents — where input guardrails sit in the full control stack.
Prompt-injection threat and data exfiltration — the attacks an input rule is built to stop.

Read the Guardrails reference for the complete engine, or the security quickstart to wire input guardrails and the firewall together for an agent baseline.

​1. Input guardrails for LLM apps, before the model

​2. What you can run at the input stage

Mask PII in the prompt

Block secrets before they leak

Stop injection attempts

Cap prompt size

​3. One concrete example

​4. Why an input block costs no quota

​5. Resolution and fallback

​6. Prove it before you ship

​7. Where to go next

1. Input guardrails for LLM apps, before the model

2. What you can run at the input stage

3. One concrete example

4. Why an input block costs no quota

5. Resolution and fallback

6. Prove it before you ship

7. Where to go next