A PII guardrail screens the text of prompts and responses. To govern the
actions an agent takes with data — fetch tools, egress hosts — see
Data exfiltration. The two planes
compose; most teams run both.
1. How the exposure happens
PII reaches an upstream provider through ordinary, well-intentioned traffic:- A user pastes their own contact details into a chat and your app forwards the whole message verbatim.
- A RAG pipeline retrieves a document containing customer records and stuffs it into the prompt as context.
- An agent reads a database row and includes raw fields in a tool argument or a follow-up prompt.
- The model’s response restates or infers PII, which your app then writes to its own logs.
2. Defend the llm pii leak with a PII guardrail
A guardrail is a workspace-scoped, named content policy. Apii rule inside it detects sensitive entities and applies one
action to each match:
| Action | Effect |
|---|---|
mask | Replace each match with a typed tag — jane@acme.com → [EMAIL] — and forward the cleaned text. The model never sees the original. |
block | Reject the whole request with HTTP 400 guardrail_blocked. Use when PII must never reach the provider at all. |
flag | Change nothing about the traffic; record a match. Measure exposure before you enforce. |
email, phone, credit_card, ssn, ip, iban, mac_address, jwt,
aws_access_key, api_key_openai, bitcoin_address, plus the
checksum-gated regional identifiers jp_mynumber, kr_rrn, and
cn_resident_id.
On a mask action each match renders as its typed tag — [EMAIL], [SSN],
[CREDIT_CARD], and so on — so the structure of the prompt survives while the
value is gone.
3. Concrete example — mask PII on the request
The fastest start is the PII Shield preset: a singlepii rule that masks
email, phone, ssn, credit_card, and ip. Configure it in the console —
no code changes, no key in this step.
Create the guardrail
In the console, open Guardrails and click New guardrail. Pick the
PII Shield preset from the pii category, or hand-author one
pii
rule with action mask over the entities above. Save. (Writes require
the Developer role or higher.)Prove it in the sandbox
Open the Test tab, paste “reply to jane@acme.com”, pick the
input
stage, and run. The sandbox returns reply to [EMAIL] — locally, with no
upstream call and no quota spent.Attach it to a key
In API Keys, edit a key and select the guardrail from the
Guardrail dropdown, or set the guardrail as the workspace default so
every unattached key inherits it. The binding lives on the key in the
gateway.
4. Mask most, block the worst — per-entity overrides
A single rule can apply different actions to different entities viaentity_actions. Mask low-risk identifiers but hard-block the entities you
never want forwarded — one rule instead of three overlapping ones:
guardrail_blocked instead. A
blocked request costs no quota — an input-stage block fires before
metering — and is marked skip-retry. Each entity_actions key must be an
entity declared on the rule (built-in or custom); its action is validated
against the rule’s action set.
5. What works on streaming today
Action and stage interact with streaming differently — know the matrix before you depend on it:Input-stage mask or block (any response mode)
Input-stage mask or block (any response mode)
Fully live. The prompt is screened before the upstream call, so
masking and blocking work identically whether or not the response streams.
This is the surface PII Shield enforces today.
Output-stage block
Output-stage block
Enforced on both streaming and non-streaming responses. On a stream, a
scanner cuts the stream mid-flight and emits a replacement message before
any blocked content reaches the client; an output block refunds the
pre-consumed quota.
Output-stage mask
Output-stage mask
Currently non-streaming only. On a streamed response the original
chunk passes through unmasked — in-band stream rewriting is a planned
enhancement. For response masking today, use non-streaming requests, or
rely on input-stage masking. Prove your exact stage/stream combination in
the Test tab first.
6. See what was caught
Every rule that fires records a match — its type, action, stage, and a detail string — visible on the workspace Matches feed (GET /api/guardrail/match, open to any member). From there you can group, filter,
export to CSV, and mark false positives.
Raw values are not logged by default. A guardrail’s Log raw content
toggle is off — the privacy-conservative posture — so the Matches feed records
that a PII rule fired and which entity, but not the matched substring
(the email address itself). Turn it on per guardrail only when you need the
value for triage; the setting is non-retroactive. Capturing PII in your own
audit trail to debug a PII leak would be self-defeating.
7. Take it further
For full residency, retention, and right-to-erasure controls — including installing a compliance pack that materializes these guardrails for GDPR, HIPAA, or PCI DSS — start from the reference pages below.Guardrails reference
Every rule type, stage, action, custom entities, versioning, and the eval
harness — the deep reference behind this page.
Secret leakage
The credential-shaped sibling — AWS, OpenAI, GitHub tokens — caught by the
Secrets Blocker guardrail.
Unsafe output
Screening what the model sends back, not just what it receives.
Guardrails vs Firewall
When to screen text and when to govern actions — and why you usually want
both.
