Secret and credential leakage

Secrets end up where they don’t belong. A developer pastes a .env file into a prompt for “help debugging.” A retrieved document carries an embedded API key. A model, asked to “show the config,” echoes an AWS access key straight back to the client. An agent constructs a tool call with a live token baked into the arguments. Every one of these is a path for a credential to escape — into an upstream provider’s logs, into a client transcript, or into a third-party tool. This page covers how OrcaRouter’s Guardrails and Agent Firewall let you defend against llm secret leakage — without changing your application code.

Detection happens at the gateway, in front of every bound key — so a single policy covers every provider, every model, and every agent without an SDK change.

1. Where secrets leak

A credential can escape at three distinct points in a request:

In the prompt (input)

The credential is in the request before the model ever runs — a pasted key, a .env snippet, a token inside a retrieved RAG chunk. Left unchecked it reaches the upstream provider and may land in their logs. Stop it with the Secrets Blocker input guardrail (§2).

In the response (output)

The model emits a secret back to your client — it regurgitates a key from its context, or hallucinates a credential-shaped string. Catch it with an output secrets rule (§3).

In a tool-call argument

Your agent builds a tool call with a token in the arguments. The Firewall’s sanitize verdict redacts matched substrings from the arguments before the call dispatches (§4).

The first two are content checks (Guardrails); the third is an action check (Firewall). Layer all three for defense in depth.

2. Secrets Blocker — stop credentials in the prompt

The Secrets Blocker is a guardrail preset under the secrets category that runs at the input stage. It scans the request for credential shapes — AWS access keys, OpenAI-style keys, and GitHub tokens — and blocks the call before it leaves the gateway. The credential never reaches a model. Author it once in the console, attach a key, and every request through that key is screened:

Create the guardrail

In the console, open /console/guardrails, click New guardrail, and apply the Secrets & API-Key Blocker preset from the secrets category. It seeds a guardrail with input-stage block rules for the common credential shapes — edit freely from there.

Attach a key

Open /console/token, edit an API key, and pick the guardrail from the Guardrail dropdown — or set it as the workspace default so every unattached key inherits it.

Send a request

Call the gateway exactly as before with that sk-orca-... key:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "why is AKIAIOSFODNN7EXAMPLE rejected"}
    ]
  }'

The AWS-key shape trips the guardrail and the request is rejected with HTTP 400 guardrail_blocked. The key never reaches the model.

A blocked request costs no quota — an input-stage block fires before metering — and is marked skip-retry, so re-running the same prompt just blocks again instead of burning a fallback channel.

Need broader coverage? The secrets category also ships a Private Keys & Cloud Tokens preset that blocks PEM private keys, Slack and Stripe tokens, Google API keys, and JWTs. Apply both — a guardrail can hold any number of rules. To catch JWTs and credential shapes as typed entities (with [JWT] / [AWS_ACCESS_KEY] tags), a pii rule covering jwt, aws_access_key, and api_key_openai is the entity-driven alternative; see the Guardrails reference.

The Firewall’s tight autonomy level turns the Secrets Blocker guardrail on as part of its posture, alongside PII Shield and destructive-shell denials. If you want one switch instead of authoring rules by hand, that’s the fast path. The credential check itself is always the guardrail — not the firewall’s argument scanner.

3. Block secrets in model output

A secret can also leave in the response — the model echoes a key from its context or emits a credential-shaped string. Add a rule on the output stage to screen the model’s reply before it returns to the client. The secrets category ships a Code Secret in Output preset for exactly this: output-stage block rules for PEM private keys, AWS access keys, and OpenAI-style secret tokens.

{
  "type": "regex",
  "stage": "output",
  "action": "block",
  "pattern": "AKIA[0-9A-Z]{16}"
}

Output block is enforced on both non-streaming and streaming responses — on a stream, a scanner cuts the response mid-flight before any blocked content reaches the client. An output block refunds the pre-consumed quota.

Output masking (replacing a match with a typed tag rather than rejecting the whole response) currently applies to non-streaming responses only. For credentials in output, the block action is the reliable choice on streaming traffic. Prove your stage/stream combination in the guardrail Test tab before depending on it.

4. Sanitize secrets from tool-call arguments

When your agent constructs a tool call, a credential can ride along in the arguments. The Firewall’s sanitize verdict redacts matched substrings from the tool-call arguments and forwards the cleaned call — on the response and mcp surfaces, where there are live call-time arguments to rewrite. A sanitize rule names which detectors to redact in its sanitize_json config — a set of built-in presets plus optional custom regexes. Matched material is replaced with [redacted:<preset>] (custom matches with [redacted:custom]):

{
  "priority": 10,
  "label": "Redact AWS keys from tool args",
  "stage": "response",
  "tool_name_glob": "*",
  "verdict": "sanitize",
  "sanitize_json": {
    "presets": ["aws_access_key", "aws_secret_key", "openai_key", "anthropic_key", "bearer_token"],
    "custom": []
  }
}

The secret-shape presets available to a sanitizer are aws_access_key, aws_secret_key, openai_key, anthropic_key, and bearer_token (plus email, ssn_us, and credit_card for PII). A sanitize rule must name at least one preset or custom pattern — an empty sanitizer is rejected on save.

Sanitize redacts arguments, not results. It cleans the arguments of a tool call your agent authored; it does not scrub the content a tool returns. And on the inbound surface — where there are no call-time arguments yet — sanitize escalates to a deny. See the firewall rules reference for the matching language.

The Secrets Blocker guardrail (§2) remains your primary defense for credentials in the request body — the firewall sanitizer is the action-layer complement for secrets that appear specifically inside tool-call arguments.

5. Layering the three defenses

Where the secret is	Layer that stops it	Action
In the prompt	Secrets Blocker (input guardrail)	block
In the model’s reply	Output secrets rule (output guardrail)	block
In a tool-call argument	Firewall sanitizer	sanitize

Roll any new rule out in shadow mode (firewall) or with the flag action (guardrail) first. Watch the events / Matches feed to confirm it fires on real credentials and not on benign look-alikes, then switch to the enforcing action.

6. Observe what fired

Every guardrail rule that fires records a match — rule type, action, stage, and a detail string — to the workspace Matches feed (GET /api/guardrail/match, Member). The matched substring is recorded only when “Log raw content” is on, which is off by default — the privacy-conservative posture, so the Matches feed doesn’t itself become a place secrets accumulate. Leave it off for credential rules unless you specifically need the substring for triage. Firewall sanitize decisions land in the Firewall Events feed (GET /api/workspace/firewall/events, Developer+), with secrets and rule blobs never logged.

7. Where to go next

Guardrails reference

Rule types, PII entities, presets, the test sandbox, and the eval harness in full.

Firewall rules reference

The matching language — tool globs, argument clauses, and sanitizers.

PII exposure

The sibling content threat: personal data in prompts and responses.

Data exfiltration

When a leaked credential becomes the payload of an outbound exfil call.

Guardrails vs Firewall

Which plane stops which class of leak, and how they compose.

Secure-agents baseline

The starter posture that turns these defenses on together.

​1. Where secrets leak

​2. Secrets Blocker — stop credentials in the prompt

​3. Block secrets in model output

​4. Sanitize secrets from tool-call arguments

​5. Layering the three defenses

​6. Observe what fired

​7. Where to go next

Guardrails reference

Firewall rules reference

PII exposure

Data exfiltration

Guardrails vs Firewall

Secure-agents baseline

1. Where secrets leak

2. Secrets Blocker — stop credentials in the prompt

3. Block secrets in model output

4. Sanitize secrets from tool-call arguments

5. Layering the three defenses

6. Observe what fired

7. Where to go next