Guardrail logging and privacy

When a guardrail rule fires, OrcaRouter records a match so you can see what tripped and how often. The privacy question is the one this page answers: does that record contain the actual sensitive text — the real email, the SSN, the API key — or just the fact that a rule matched? By default it contains only the fact. Guardrail privacy logging on the hosted gateway is conservative on purpose: the matched substring is not stored unless you explicitly turn on Log raw content for that guardrail, and flipping the toggle never reaches back over data you have already logged. This is a focused landing for the privacy posture of the Matches feed. For the feed itself — browsing, grouping, exporting — see Matches feed. For the full engine, see the Guardrails reference.

1. Guardrail privacy logging: off by default

Every guardrail carries a single per-policy toggle, Log raw content, and it ships off. With it off, a match records the metadata of what fired but never copies the offending text into the feed:

Recorded with the toggle OFF

Rule type, action, stage, and a short detail string — enough to know a pii rule masked an email on the request, without storing the address.

Added only when ON

The matched substring(s) — the literal text the rule caught. Captured only for matches recorded after you enable the toggle.

The rationale is the one most compliance teams want by default: you learn that an SSN appeared in your traffic and how the policy handled it, without copying regulated data back out of the request and into your own diagnostic store.

Off by default is the privacy-conservative posture. The matched substring is the most sensitive thing a guardrail could log — it is, by definition, the data the rule exists to catch. OrcaRouter does not store it unless you opt in per guardrail.

2. What a match record holds

A match is a small, workspace-scoped diagnostic record. With Log raw content off, it carries metadata only:

Field	Example	Present when toggle is off?
Rule type	`pii`, `regex`, `keyword`	Yes
Action	`block`, `mask`, `flag`	Yes
Stage	`input`, `output`	Yes
Detail	short classifier string (e.g. the entity)	Yes
Matched substring	`jane@acme.com`	Only when ON

The matched-substring field is the only thing the toggle gates. Everything else is recorded either way, so the feed is useful for volume, trend, and action-mix analysis even with raw content off.

You can run an entire observe-or-enforce program — see where PII enters, which rules fire most, whether a policy is noisy — purely on the metadata. Turn the substring on only for the narrow window where you need to eyeball exactly what matched during triage.

3. One concrete example

Take a guardrail with a pii rule that masks email on the request, attached to a key. A caller sends:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Reply to jane@acme.com please"}
    ]
  }'

The rule masks the address to [EMAIL] before the model sees it, and a match lands in the feed. What that match contains depends entirely on the toggle:

Log raw content OFF (default)

The match records: rule type pii, action mask, stage input, and a detail string naming the email entity. It does not store jane@acme.com. You know an email was masked on the request; you cannot read the email back out of the feed.

Log raw content ON

The same match additionally carries the matched substring — jane@acme.com — so you can confirm precisely what the rule caught during a triage pass.

The request itself is identical in both cases. The toggle changes only what the diagnostic feed retains, never what the caller or the upstream model experiences.

4. Turning it on (and the non-retroactive guarantee)

Log raw content is a per-guardrail setting. Editing a guardrail is a console action under your own session and requires Developer+ in the workspace — only the final /v1/* call uses an sk-orca-... relay key.

Open the guardrail

In the console, open Guardrails and edit the policy you want to capture substrings for.

Enable Log raw content

Turn on the Log raw content toggle and save. Saving writes a versioned history row, so the change is auditable and revertable — see Versioning.

Capture begins going forward

From the next request onward, matches on this guardrail include the matched substring. Matches recorded before you flipped the toggle stay metadata-only.

The toggle is non-retroactive — both ways. Turning it on does not back-fill substrings onto matches you already logged; those older records stay metadata-only forever. Turning it off stops capturing new substrings but does not erase substrings already stored on past matches. If you need those gone, see §6.

5. What gets captured when it is on

When Log raw content is on, the engine attaches the literal matched text to each violation, with two hard caps that keep one pathological input from ballooning a single match record:

At most 32 matched entries per violation.
Each entry is capped at 256 characters.

So a guardrail that fires on a huge document stores a bounded, representative sample of what matched — not the entire body. The detail string is independently length-clamped as well. These caps exist for storage hygiene; treat the captured set as evidence of what matched, not a verbatim transcript of the whole request.

Even with the toggle on, a guardrail only ever records text that a rule actually matched. The surrounding prompt and the rest of the response are never copied into the Matches feed. Full request/response payloads are a separate concern from guardrail diagnostics.

6. Removing substrings you have already captured

Because the toggle is non-retroactive, turning it off leaves prior substrings in place. Two surfaces clear them:

Want to remove	How
One noisy match	Mark it a false positive — `POST /api/guardrail/match/:id/mark-fp` (workspace Admin), or the Mark false positive action in the feed.
All guardrail matches for a user	A user self-deletion triggers a 30-day grace window, then a PII scrub that cascades through guardrail matches, request logs, and firewall events. See Compliance.

For tuning a chatty rule rather than scrubbing data, the Tune false positives flow walks through marking and refining matches.

7. Who can read what

The Matches feed is workspace-scoped diagnostic data. Read access is open to every active member; the destructive false-positive action is gated higher:

Action	Route	Role
List / group / stats / export matches	`GET /api/guardrail/match*`	Member
Single match detail	`GET /api/guardrail/match/:id`	Member
Mark / un-mark false positive	`POST` / `DELETE /api/guardrail/match/:id/mark-fp`	Admin
Edit a guardrail (incl. Log raw content)	`PUT /api/guardrail/`	Developer+

These management routes authenticate with your console session, not a relay key. Reads never expose a substring that the toggle did not capture — there is nothing extra to redact at read time, because nothing extra was stored.

8. A practical privacy default

For most workspaces the right shape is: leave Log raw content off, run your guardrails on metadata, and flip the toggle on temporarily for a single policy when you are actively debugging why a rule fires the way it does. Then flip it back off — new matches stop carrying substrings immediately.

This pairs naturally with an observe-only rollout. Start with the Compliance Logger (flag-only), watch the Matches feed on metadata, and only reach for raw content if a specific match needs a closer look.

9. Where to go next

Matches feed

Browse, group, filter, and export every recorded match.

Tune false positives

Mark and refine matches to quiet a noisy rule.

Versioning

Every toggle flip is a versioned, revertable change.

Compliance

Retention, data-subject erasure, and signed reports.

For how this fits the broader control stack, see Guardrails vs firewall and Data exfiltration. For the complete engine — stages, advanced rules, and routes — read the Guardrails reference.

​1. Guardrail privacy logging: off by default

Recorded with the toggle OFF

Added only when ON

​2. What a match record holds

​3. One concrete example

​4. Turning it on (and the non-retroactive guarantee)

​5. What gets captured when it is on

​6. Removing substrings you have already captured

​7. Who can read what

​8. A practical privacy default

​9. Where to go next

Matches feed

Tune false positives

Versioning

Compliance

1. Guardrail privacy logging: off by default

2. What a match record holds

3. One concrete example

4. Turning it on (and the non-retroactive guarantee)

5. What gets captured when it is on

6. Removing substrings you have already captured

7. Who can read what

8. A practical privacy default

9. Where to go next