.env file
into a prompt for “help debugging.” A retrieved document carries an
embedded API key. A model, asked to “show the config,” echoes an AWS
access key straight back to the client. An agent constructs a tool call
with a live token baked into the arguments. Every one of these is a path
for a credential to escape — into an upstream provider’s logs, into a
client transcript, or into a third-party tool.
This page covers how OrcaRouter’s Guardrails and
Agent Firewall let you defend against llm secret
leakage — without changing your application code.
Detection happens at the gateway, in front of every bound key — so a
single policy covers every provider, every model, and every agent without
an SDK change.
1. Where secrets leak
A credential can escape at three distinct points in a request:In the prompt (input)
In the prompt (input)
The credential is in the request before the model ever runs — a
pasted key, a
.env snippet, a token inside a retrieved RAG chunk.
Left unchecked it reaches the upstream provider and may land in their
logs. Stop it with the Secrets Blocker input guardrail
(§2).In the response (output)
In the response (output)
The model emits a secret back to your client — it regurgitates a key
from its context, or hallucinates a credential-shaped string. Catch it
with an output secrets rule
(§3).
In a tool-call argument
In a tool-call argument
Your agent builds a tool call with a token in the arguments. The
Firewall’s sanitize verdict redacts matched substrings from the
arguments before the call dispatches
(§4).
2. Secrets Blocker — stop credentials in the prompt
The Secrets Blocker is a guardrail preset under the secrets category that runs at the input stage. It scans the request for credential shapes — AWS access keys, OpenAI-style keys, and GitHub tokens — and blocks the call before it leaves the gateway. The credential never reaches a model. Author it once in the console, attach a key, and every request through that key is screened:Create the guardrail
In the console, open
/console/guardrails, click New guardrail,
and apply the Secrets & API-Key Blocker preset from the
secrets category. It seeds a guardrail with input-stage block
rules for the common credential shapes — edit freely from there.Attach a key
Open
/console/token, edit an API key, and pick the guardrail from
the Guardrail dropdown — or set it as the workspace default so
every unattached key inherits it.[JWT] / [AWS_ACCESS_KEY] tags), a pii rule covering
jwt, aws_access_key, and api_key_openai is the entity-driven
alternative; see the Guardrails reference.
3. Block secrets in model output
A secret can also leave in the response — the model echoes a key from its context or emits a credential-shaped string. Add a rule on the output stage to screen the model’s reply before it returns to the client. The secrets category ships a Code Secret in Output preset for exactly this: output-stage block rules for PEM private keys, AWS access keys, and OpenAI-style secret tokens.Output masking (replacing a match with a typed tag rather than
rejecting the whole response) currently applies to non-streaming responses
only. For credentials in output, the block action is the reliable
choice on streaming traffic. Prove your stage/stream combination in the
guardrail Test tab before depending on it.
4. Sanitize secrets from tool-call arguments
When your agent constructs a tool call, a credential can ride along in the arguments. The Firewall’s sanitize verdict redacts matched substrings from the tool-call arguments and forwards the cleaned call — on theresponse and mcp surfaces, where there are live call-time arguments to
rewrite.
A sanitize rule names which detectors to redact in its sanitize_json
config — a set of built-in presets plus optional custom regexes.
Matched material is replaced with [redacted:<preset>] (custom matches
with [redacted:custom]):
aws_access_key,
aws_secret_key, openai_key, anthropic_key, and bearer_token (plus
email, ssn_us, and credit_card for PII). A sanitize rule must name at
least one preset or custom pattern — an empty sanitizer is rejected on
save.
The Secrets Blocker guardrail (§2)
remains your primary defense for credentials in the request body — the
firewall sanitizer is the action-layer complement for secrets that appear
specifically inside tool-call arguments.
5. Layering the three defenses
| Where the secret is | Layer that stops it | Action |
|---|---|---|
| In the prompt | Secrets Blocker (input guardrail) | block |
| In the model’s reply | Output secrets rule (output guardrail) | block |
| In a tool-call argument | Firewall sanitizer | sanitize |
6. Observe what fired
Every guardrail rule that fires records a match — rule type, action, stage, and a detail string — to the workspace Matches feed (GET /api/guardrail/match, Member). The matched substring is recorded
only when “Log raw content” is on, which is off by default — the
privacy-conservative posture, so the Matches feed doesn’t itself become a
place secrets accumulate. Leave it off for credential rules unless you
specifically need the substring for triage.
Firewall sanitize decisions land in the Firewall Events feed
(GET /api/workspace/firewall/events, Developer+), with secrets and rule
blobs never logged.
7. Where to go next
Guardrails reference
Rule types, PII entities, presets, the test sandbox, and the eval
harness in full.
Firewall rules reference
The matching language — tool globs, argument clauses, and sanitizers.
PII exposure
The sibling content threat: personal data in prompts and responses.
Data exfiltration
When a leaked credential becomes the payload of an outbound exfil call.
Guardrails vs Firewall
Which plane stops which class of leak, and how they compose.
Secure-agents baseline
The starter posture that turns these defenses on together.
