1. Input guardrails for LLM apps, before the model
Every guardrail rule carries a stage —input, output, or both.
An input rule runs against the request text the moment it arrives, on
the way to the upstream model:
Input rules screen the caller’s request. If you also use
registry prompts, the injected system message is
added later in routing — so input rules see the messages your app sent,
not the injected prompt. Output rules screen the response either way.
2. What you can run at the input stage
Any rule type can run atinput. The most common reasons to gate the
request before the model:
Mask PII in the prompt
A
pii rule with the mask action rewrites entities to typed tags
(jane@acme.com → [EMAIL]) so the upstream model never sees the
raw value. See PII Shield.Block secrets before they leak
A request carrying an API key or cloud credential is rejected at the
door — pre-metering, no upstream call. See
Block secrets.
Stop injection attempts
The Prompt-Injection basics preset pairs keyword/regex detectors with
an
llm_judge rule for injection intent. See
Prompt injection.Cap prompt size
A
max_chars rule rejects an oversized prompt before it bills any
tokens. See Cost guardrails.keyword, regex, pii, max_chars,
external, llm_judge, grounding — and the five actions block,
mask, flag, annotate, and spotlight all apply here. (spotlight
wraps matched untrusted text in delimiters so the model treats it as
data, not instructions — an input-stage prompt-injection defense;
annotate attaches a note without changing the traffic.) One exception
worth knowing:
grounding measures the
answer against retrieved sources, so it is inherently an output-stage
check. Everything else is a natural fit for the input stage.
3. One concrete example
Author the rule in the console (under your own session — guardrail config needs Developer+), not with a relay key. Add a singleinput
rule to a guardrail named secrets-shield:
guardrail_id, or mark it the
workspace default — see Attach to a key),
then call the gateway with that sk-orca-... relay key:
guardrail_blocked before the gateway forwards anything upstream:
guardrail_blocked error
for the full response shape.
4. Why an input block costs no quota
This is the structural advantage of catching things on the way in. An input-stage block sits before pre-consume, so:| Property | Input-stage block |
|---|---|
| HTTP status | 400 guardrail_blocked |
| Quota charged | None — fires before metering |
| Upstream call | Never made |
| Retry | Marked skip-retry — re-running blocks again |
Because the request never reaches a channel, an input block is marked
skip-retry: re-running the same prompt against another channel would
just block again and waste effort. The output stage differs — a block
there refunds the quota the gateway already pre-consumed. Same
400,
different accounting.5. Resolution and fallback
An input-stage rule only runs if a guardrail actually resolves on the request. Resolution is explicit:- The key’s explicit
guardrail_id, if it exists and is enabled. - Otherwise the workspace default guardrail.
- Otherwise none — the request is byte-identical to a workspace with no policy.
6. Prove it before you ship
Don’t attach a blocking input rule to live traffic on faith. Two ways to validate first:Test tab — one sample
Test tab — one sample
Open the Test tab in the guardrail editor, paste a sample, pick
the
input stage, and run. The sandbox evaluates the current
policy locally — no upstream call, no quota — and returns the verdict
plus (for mask rules) the rendered text. See
Testing & eval.Flag before you block
Flag before you block
Set the action to flag first. A flag changes nothing about the
traffic — it only records a match — so you can measure how often a
rule would fire on real input before you flip it to block. See
Tune false positives.
See what fired
See what fired
Every rule that fires records a match — type, action, stage, and a
detail string. The matched substring is recorded only when Log raw
content is on (off by default). See the
Matches feed and
Logging & privacy.
7. Where to go next
The input stage stops bad input from reaching the model. To gate the model’s response, pair it with the output stage; to govern an agent’s tool calls, use the firewall.- Output-stage rules — screen the model’s response after it comes back.
- Stages and
both— when to run a rule on input, output, or both. - Securing AI agents — where input guardrails sit in the full control stack.
- Prompt-injection threat and data exfiltration — the attacks an input rule is built to stop.
