1. Guardrail privacy logging: off by default
Every guardrail carries a single per-policy toggle, Log raw content, and it ships off. With it off, a match records the metadata of what fired but never copies the offending text into the feed:Recorded with the toggle OFF
Rule type, action, stage, and a short detail string — enough to know
a
pii rule masked an email on the request, without storing the
address.Added only when ON
The matched substring(s) — the literal text the rule caught.
Captured only for matches recorded after you enable the toggle.
Off by default is the privacy-conservative posture. The matched
substring is the most sensitive thing a guardrail could log — it is, by
definition, the data the rule exists to catch. OrcaRouter does not store
it unless you opt in per guardrail.
2. What a match record holds
A match is a small, workspace-scoped diagnostic record. With Log raw content off, it carries metadata only:| Field | Example | Present when toggle is off? |
|---|---|---|
| Rule type | pii, regex, keyword | Yes |
| Action | block, mask, flag | Yes |
| Stage | input, output | Yes |
| Detail | short classifier string (e.g. the entity) | Yes |
| Matched substring | jane@acme.com | Only when ON |
3. One concrete example
Take a guardrail with apii rule that masks email on the request,
attached to a key. A caller sends:
[EMAIL] before the model sees it, and a
match lands in the feed. What that match contains depends entirely on
the toggle:
Log raw content OFF (default)
Log raw content OFF (default)
The match records: rule type
pii, action mask, stage input, and
a detail string naming the email entity. It does not store
jane@acme.com. You know an email was masked on the request; you
cannot read the email back out of the feed.Log raw content ON
Log raw content ON
The same match additionally carries the matched substring —
jane@acme.com — so you can confirm precisely what the rule caught
during a triage pass.4. Turning it on (and the non-retroactive guarantee)
Log raw content is a per-guardrail setting. Editing a guardrail is a console action under your own session and requires Developer+ in the workspace — only the final/v1/* call uses an sk-orca-... relay
key.
Open the guardrail
In the console, open Guardrails and edit the policy you want to
capture substrings for.
Enable Log raw content
Turn on the Log raw content toggle and save. Saving writes a
versioned history row, so the change is auditable and revertable —
see Versioning.
5. What gets captured when it is on
When Log raw content is on, the engine attaches the literal matched text to each violation, with two hard caps that keep one pathological input from ballooning a single match record:- At most 32 matched entries per violation.
- Each entry is capped at 256 characters.
Even with the toggle on, a guardrail only ever records text that a rule
actually matched. The surrounding prompt and the rest of the response
are never copied into the Matches feed. Full request/response payloads
are a separate concern from guardrail diagnostics.
6. Removing substrings you have already captured
Because the toggle is non-retroactive, turning it off leaves prior substrings in place. Two surfaces clear them:| Want to remove | How |
|---|---|
| One noisy match | Mark it a false positive — POST /api/guardrail/match/:id/mark-fp (workspace Admin), or the Mark false positive action in the feed. |
| All guardrail matches for a user | A user self-deletion triggers a 30-day grace window, then a PII scrub that cascades through guardrail matches, request logs, and firewall events. See Compliance. |
7. Who can read what
The Matches feed is workspace-scoped diagnostic data. Read access is open to every active member; the destructive false-positive action is gated higher:| Action | Route | Role |
|---|---|---|
| List / group / stats / export matches | GET /api/guardrail/match* | Member |
| Single match detail | GET /api/guardrail/match/:id | Member |
| Mark / un-mark false positive | POST / DELETE /api/guardrail/match/:id/mark-fp | Admin |
| Edit a guardrail (incl. Log raw content) | PUT /api/guardrail/ | Developer+ |
8. A practical privacy default
For most workspaces the right shape is: leave Log raw content off, run your guardrails on metadata, and flip the toggle on temporarily for a single policy when you are actively debugging why a rule fires the way it does. Then flip it back off — new matches stop carrying substrings immediately.9. Where to go next
Matches feed
Browse, group, filter, and export every recorded match.
Tune false positives
Mark and refine matches to quiet a noisy rule.
Versioning
Every toggle flip is a versioned, revertable change.
Compliance
Retention, data-subject erasure, and signed reports.
