Skip to main content
Any prompt your app sends to a model can carry personal data it shouldn’t — an email pasted into a support ticket, an SSN in a CRM note, a card number a user typed into a chat box. Once that text reaches an upstream provider it’s out of your control: logged, cached, maybe used for training. The model’s response can leak PII back too, echoing or inferring details that then land in your application logs. This page shows how to stop an llm pii leak at the gateway with a PII guardrail — a workspace-scoped rule that masks or blocks sensitive entities on the request before the model ever sees them. It’s the content-layer peer of the Agent Firewall, and it needs no change to your application code.
A PII guardrail screens the text of prompts and responses. To govern the actions an agent takes with data — fetch tools, egress hosts — see Data exfiltration. The two planes compose; most teams run both.

1. How the exposure happens

PII reaches an upstream provider through ordinary, well-intentioned traffic:
  • A user pastes their own contact details into a chat and your app forwards the whole message verbatim.
  • A RAG pipeline retrieves a document containing customer records and stuffs it into the prompt as context.
  • An agent reads a database row and includes raw fields in a tool argument or a follow-up prompt.
  • The model’s response restates or infers PII, which your app then writes to its own logs.
None of these is an attack — they’re the normal shape of LLM apps. The fix is a policy that screens every request and response at one choke point, instead of auditing every call site in your code.

2. Defend the llm pii leak with a PII guardrail

A guardrail is a workspace-scoped, named content policy. A pii rule inside it detects sensitive entities and applies one action to each match:
ActionEffect
maskReplace each match with a typed tag — jane@acme.com[EMAIL] — and forward the cleaned text. The model never sees the original.
blockReject the whole request with HTTP 400 guardrail_blocked. Use when PII must never reach the provider at all.
flagChange nothing about the traffic; record a match. Measure exposure before you enforce.
The detector set is built-in and deterministic — pure pattern matching, no network call, safe on the hot path. Built-in entities: email, phone, credit_card, ssn, ip, iban, mac_address, jwt, aws_access_key, api_key_openai, bitcoin_address, plus the checksum-gated regional identifiers jp_mynumber, kr_rrn, and cn_resident_id. On a mask action each match renders as its typed tag — [EMAIL], [SSN], [CREDIT_CARD], and so on — so the structure of the prompt survives while the value is gone.
Need a detector that isn’t built in (an internal employee ID, an account number)? Add a custom entity — a regex with optional Luhn checksum, up to 25 per rule — right alongside the built-ins. See the Guardrails reference.

3. Concrete example — mask PII on the request

The fastest start is the PII Shield preset: a single pii rule that masks email, phone, ssn, credit_card, and ip. Configure it in the console — no code changes, no key in this step.
1

Create the guardrail

In the console, open Guardrails and click New guardrail. Pick the PII Shield preset from the pii category, or hand-author one pii rule with action mask over the entities above. Save. (Writes require the Developer role or higher.)
2

Prove it in the sandbox

Open the Test tab, paste “reply to jane@acme.com, pick the input stage, and run. The sandbox returns reply to [EMAIL] — locally, with no upstream call and no quota spent.
3

Attach it to a key

In API Keys, edit a key and select the guardrail from the Guardrail dropdown, or set the guardrail as the workspace default so every unattached key inherits it. The binding lives on the key in the gateway.
4

Call the gateway as usual

Using that key, your relay call is unchanged:
curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Draft a reply to jane@acme.com"}
    ]
  }'
The gateway rewrites the email to [EMAIL] before forwarding. The upstream model never receives the address.
PII Shield is a both-stage rule, but live request-stage masking is what ships today — the gateway masks the prompt before it leaves for the model. Output-stage (response) masking on the live relay is on the roadmap. To verify how a response-stage rule behaves, evaluate it in the Test tab. For streaming, see §5.

4. Mask most, block the worst — per-entity overrides

A single rule can apply different actions to different entities via entity_actions. Mask low-risk identifiers but hard-block the entities you never want forwarded — one rule instead of three overlapping ones:
{
  "type": "pii",
  "stage": "input",
  "action": "mask",
  "entities": ["email", "phone", "ip", "credit_card", "ssn"],
  "entity_actions": {
    "credit_card": "block",
    "ssn": "block"
  }
}
Here emails, phones, and IPs are masked and pass through; a prompt carrying a card number or SSN is rejected with HTTP 400 guardrail_blocked instead. A blocked request costs no quota — an input-stage block fires before metering — and is marked skip-retry. Each entity_actions key must be an entity declared on the rule (built-in or custom); its action is validated against the rule’s action set.

5. What works on streaming today

Action and stage interact with streaming differently — know the matrix before you depend on it:
Fully live. The prompt is screened before the upstream call, so masking and blocking work identically whether or not the response streams. This is the surface PII Shield enforces today.
Enforced on both streaming and non-streaming responses. On a stream, a scanner cuts the stream mid-flight and emits a replacement message before any blocked content reaches the client; an output block refunds the pre-consumed quota.
Currently non-streaming only. On a streamed response the original chunk passes through unmasked — in-band stream rewriting is a planned enhancement. For response masking today, use non-streaming requests, or rely on input-stage masking. Prove your exact stage/stream combination in the Test tab first.

6. See what was caught

Every rule that fires records a match — its type, action, stage, and a detail string — visible on the workspace Matches feed (GET /api/guardrail/match, open to any member). From there you can group, filter, export to CSV, and mark false positives.
Raw values are not logged by default. A guardrail’s Log raw content toggle is off — the privacy-conservative posture — so the Matches feed records that a PII rule fired and which entity, but not the matched substring (the email address itself). Turn it on per guardrail only when you need the value for triage; the setting is non-retroactive. Capturing PII in your own audit trail to debug a PII leak would be self-defeating.

7. Take it further

For full residency, retention, and right-to-erasure controls — including installing a compliance pack that materializes these guardrails for GDPR, HIPAA, or PCI DSS — start from the reference pages below.

Guardrails reference

Every rule type, stage, action, custom entities, versioning, and the eval harness — the deep reference behind this page.

Secret leakage

The credential-shaped sibling — AWS, OpenAI, GitHub tokens — caught by the Secrets Blocker guardrail.

Unsafe output

Screening what the model sends back, not just what it receives.

Guardrails vs Firewall

When to screen text and when to govern actions — and why you usually want both.