Tune false positives

A guardrail that’s too eager is worse than no guardrail — your team learns to ignore the Matches feed, or you loosen the rule and lose the catch you actually wanted. OrcaRouter gives you a precise middle path: mark a single match as a false positive, and the engine remembers that finding and skips it on future requests — without touching the rule, loosening the pattern, or shipping an SDK change. This is a focused landing for the false-positive workflow. For the full guardrail engine — every rule type, field, and route — see the Guardrails reference.

Every step here is a console action on the hosted gateway (api.orcarouter.ai). You triage matches under your own session; only the final /v1/* call uses an sk-orca-... relay key. Marking a match as a false positive requires the workspace Admin role; reading the Matches feed and the resulting suppression list is open to every member.

1. Reduce guardrail false positives without weakening the rule

The instinct when a rule over-fires is to loosen it — widen a regex exclusion, drop an entity, flip block to flag. That trades one false positive for a hole in the policy. Mark-false-positive suppression is the surgical alternative:

Suppress one finding

Mute the exact match that misfired — a specific substring under a specific rule — not the whole rule. The next genuinely sensitive hit still fires.

No rule edit, no redeploy

The suppression lives in the gateway as workspace memory. The rule stays exactly as written; your app keeps calling /v1/* unchanged.

Workspace-wide memory

One Admin marks it once; the suppression is deduped across the workspace, so every member’s traffic benefits — no per-key fan-out.

Reversible

Un-mark the match (or delete the suppression) and the finding fires again on the next request. Nothing is destroyed.

Suppression is for a finding you’ve judged benign. If a whole rule is miscalibrated — wrong shape, wrong stage — fix the rule and prove it in the Eval harness instead of muting match after match.

2. How a match becomes a suppression

Every rule that fires records a match in the workspace Matches feed — rule type, action, stage, and a detail string. When you mark one of those matches as a false positive, the gateway derives a stable fingerprint for the finding and writes it to the workspace’s suppression list. On every future request, the engine checks each finding’s fingerprint against that list and skips a suppressed one before it can block, mask, or flag. Two kinds of finding produce a fingerprint:

Code-security findings carry their own fingerprint

A CVE / SBOM finding already ships with a stable identity — the advisory or component identity travels with the finding. Suppressing one mutes that exact CVE/component, and only that one. This is the native case the suppression store was built for.

Deterministic rules get a synthetic fingerprint

Keyword, regex, PII, and the other deterministic rule types don’t carry an identity of their own, so the gateway synthesizes one from data that’s identical on the write side (your mark-FP click) and the enforce side (the next request): the guardrail, the rule’s matching identity, and — when raw capture is on — the matched substrings themselves.

The synthetic fingerprint’s precision depends on Log raw content, which is off by default. With capture on, the fingerprint keys on the exact matched substring, so suppressing ORD-48291507 mutes that order number and nothing else. With capture off, there’s no substring to key on, so the suppression falls back to a rule-level mute — it silences that one rule (at that stage) for the workspace. The fallback never reaches beyond the rule it came from. See Logging & privacy.

3. One concrete example

Say you run a regex rule that masks internal order numbers shaped like ORD- plus eight digits. A support ticket legitimately quotes ORD-48291507 in a way you’ve decided is fine to pass through. You don’t want to weaken the rule — you just want this one number to stop firing.

Open the Matches feed

In the console, open Guardrails → Matches. Filter by guardrail and rule type to find the row for the ORD-48291507 hit. (To see the literal substring, the guardrail’s Log raw content must have been on when the match was recorded — it’s off by default.)

Mark it a false positive

Open the match detail and choose Mark as false positive. As a workspace Admin, this stamps the match and mirrors a workspace suppression keyed on the finding’s fingerprint.

Confirm it's suppressed

Open the Suppressions list — the new entry appears, labelled with the guardrail and rule it came from and the reason “Marked as false positive from Matches”. Every member of the workspace can read this list.

Send the same request again

Using your relay key, call OrcaRouter exactly as before — no new headers, no SDK change:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Status of order ORD-48291507?"}
    ]
  }'

The suppressed finding is skipped — ORD-48291507 passes through — while any other order number still matches and is masked as before.

4. Suppress vs. the alternatives

Suppression is one of four ways to quiet a noisy rule. Pick the narrowest one that fits:

Approach	What it changes	When to reach for it
Mark FP	One finding (or one rule, capture-off)	A specific benign hit; rule is otherwise right
Edit the rule	The matching itself	Wrong shape/stage — fix it, then re-eval
`flag` action	Observe-only, no blocking	A new rule you don’t trust yet
Eval harness	Nothing live — measures	Proving precision before you ship

Don’t paper over a systematically wrong rule by marking FP after FP. If you’re suppressing the same shape repeatedly, the rule is miscalibrated — anchor the regex, narrow the keyword list, or pick a tighter PII entity, and verify with an eval run.

5. Reverse a suppression

Nothing here is one-way:

Un-mark the match — the same Admin action, reversed, removes the match’s FP stamp and (when no other FP-marked match still maps to it) drops the suppression. The finding fires again on the next request.
Delete the suppression directly — from the Suppressions list, a Developer+ action removes the entry. Same effect: the finding is live again.

Because suppressions are workspace memory, reversing one restores the catch for every member’s traffic at once — same as how marking it suppressed for everyone.

6. API surface

These are console routes, authenticated by your session — not relay keys. Role-gate each action: marking a match FP is Admin; suppression reads are Member; suppression writes are Developer+.

Method & path	Role	Purpose
`GET /api/guardrail/match`	Member	List matches to triage.
`POST /api/guardrail/match/:id/mark-fp`	Admin	Mark a match as a false positive (mirrors a suppression).
`DELETE /api/guardrail/match/:id/mark-fp`	Admin	Un-mark — restore the finding.
`GET /api/guardrail/suppressions`	Member	List the workspace’s active suppressions.
`POST /api/guardrail/suppressions`	Developer+	Add a suppression directly.
`DELETE /api/guardrail/suppressions/:id`	Developer+	Remove a suppression.

The mark-FP endpoints are rate-limited — they’re a deliberate, low-volume triage action, not a bulk API. Reach for the Eval harness, not a loop of mark-FP calls, when you’re tuning a whole policy.

7. Where to go next

Matches feed

Where every fired rule lands — the place you triage from before you mark anything.

Testing & eval

Prove a rule’s precision against a corpus before you ship it — the systematic fix when suppression is treating a symptom.

Logging & privacy

How Log raw content controls whether suppression keys on the exact substring or falls back to a rule-level mute.

Guardrails reference

The complete engine — every rule type, action, and route.

Suppression governs content findings. To quiet a noisy agent firewall rule — a tool match you’ve judged safe — that’s a separate surface; see the Firewall and its anomaly feed. To understand where guardrails and the firewall divide, read Guardrails vs Firewall.

​1. Reduce guardrail false positives without weakening the rule

Suppress one finding

No rule edit, no redeploy

Workspace-wide memory

Reversible

​2. How a match becomes a suppression

​3. One concrete example

​4. Suppress vs. the alternatives

​5. Reverse a suppression

​6. API surface

​7. Where to go next

Matches feed

Testing & eval

Logging & privacy

Guardrails reference

1. Reduce guardrail false positives without weakening the rule

2. How a match becomes a suppression

3. One concrete example

4. Suppress vs. the alternatives

5. Reverse a suppression

6. API surface

7. Where to go next