Writing regex detectors

Some of what you need to catch isn’t a literal word and isn’t a typed PII entity — it’s a shape. A SKU format, an order-number layout, an internal URL pattern, a coupon code, a contract reference. A regex rule lets you match that shape on every call and then block, mask, or flag it, before the prompt reaches the model and before the response reaches your user. This is a focused landing for the structured-pattern use case. For the full guardrail engine — every rule type, field, and route — see the Guardrails reference.

Every step here is a console action on the hosted gateway (api.orcarouter.ai). You author the guardrail under your own session; only the final /v1/* call uses an sk-orca-... relay key. Creating and editing guardrails requires Developer+ in the workspace.

1. When you need a regex guardrail llm control

A regex rule is the right tool when the thing you want to catch has structure a literal denylist can’t express but isn’t a standard identity the pii detector already covers.

Structured codes

SKUs, coupon codes, contract refs, internal ticket IDs — a fixed prefix plus a digit or alphanumeric run.

Format-shaped tokens

Anything matched by shape rather than a finite word list — an order number layout, a serial format, an internal URL pattern.

Output leak patterns

A response that shouldn’t surface an internal hostname, a file path, or a record-ID format — scan the model’s output for the shape.

Cheap, deterministic checks

Pure pattern matching, no model call, no network — safe to run on every request in either direction.

Pick the lightest tool that fits. A finite list of literal terms → keyword denylist. A named identity shape you want a typed mask tag for ([EMAIL], [SSN]) or a Luhn check → a PII / custom entity. A structural pattern with no per-entity typing → a regex rule, covered here.

2. RE2 — linear-time, no backreferences

A regex rule’s pattern is a Go RE2 regex. RE2 is the engine that makes a regex rule safe to run on every request:

Linear-time matching — no catastrophic backtracking

RE2 guarantees matching time linear in the length of the input, regardless of the pattern. A backtracking engine can blow up exponentially on an adversarial input (a “ReDoS”); RE2 cannot. That’s why your pattern is safe to evaluate on the hot path on every call.

No backreferences, no lookaround

RE2 does not support backreferences (\1), lookahead, or lookbehind. If you’re porting a PCRE pattern that relies on those, rewrite it without them. Character classes, anchors, quantifiers, alternation, and non-capturing groups all work as expected.

Case-insensitivity and flags go in the pattern

There’s no separate “ignore case” switch — set flags inline. Prefix with (?i) for case-insensitive, (?m) for multiline. Example: (?i)\bproject-orca\b.

The pattern must compile — checked on save

The rule builder compiles your pattern when you save the guardrail. A pattern that doesn’t compile is rejected with the rule index in the error, so a bad detector never reaches the relay path.

RE2 patterns are not PCRE. The most common porting surprise is a backreference or a lookahead — neither is supported. Write the match as a character-class / alternation pattern instead and verify it in the Test tab before attaching a key.

3. Anatomy of a regex rule

A regex rule is the smallest rule in the engine after keyword: a pattern, a stage, and an action.

Field	What it does
`pattern`	A Go RE2 regex (linear-time, no backreferences). Must compile.
`stage`	`input` (request), `output` (response), or `both`.
`action`	`block`, `mask`, or `flag`.

On a mask action, every match is replaced in place with a single literal [REDACTED] tag — a regex rule isn’t typed, so it doesn’t render a per-entity tag like [EMAIL]. If you want a typed tag or a custom replacement token, model the shape as a custom PII entity instead.

4. One concrete example

Suppose your internal order numbers look like ORD- followed by eight digits, and you never want one echoed back in a model’s response. Add a single regex rule on the output stage:

{
  "type": "regex",
  "stage": "output",
  "action": "mask",
  "pattern": "ORD-\\d{8}"
}

Author it in the console:

Create a guardrail

Open Guardrails, click New guardrail, and name it (≤ 64 chars), e.g. order-id-filter.

Add a regex rule

Add one rule — Type: Regular expression, Stage: Output, Action: Mask — and paste the pattern ORD-\d{8}. Save.

Test it in the sandbox

Open the Test tab, paste a sample, pick the output stage, and run the current policy locally — no upstream call, no quota:

Your order ORD-48291507 has shipped.

Your order [REDACTED] has shipped.

Attach a key

Edit an API key and pick order-id-filter from the Guardrail dropdown (sets guardrail_id on the key), or mark the guardrail the workspace default. See Attach to a key and Account default.

Then call OrcaRouter exactly as before — no new headers, no SDK change:

curl https://api.orcarouter.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the status of my order?"}
    ]
  }'

The order number is redacted in the response before it reaches your user.

5. Stage and streaming coverage

The action you pick interacts with whether the response streams:

Action	Non-streaming	Streaming
`block` (output)	Enforced	Enforced — scanner cuts the stream
`mask` (output)	Enforced	Enforced — scanner rewrites the buffer

Input-stage rules run before the upstream call, so they’re unaffected by streaming — masking the request before the model ever sees it is live. Output mask and output block are both enforced on streaming and non-streaming responses. See Streaming coverage.

6. Pick an action

A regex rule picks one action per rule:

Block — reject the call

Any match rejects the request with HTTP 400 guardrail_blocked. A blocked request costs no quota — an input-stage block fires before metering; an output-stage block refunds the pre-consumed quota — and it’s marked skip-retry. See the guardrail_blocked error.

Mask — redact the match

Each match is replaced in place with [REDACTED] and the request continues with the sanitized text — the upstream model (input stage) or your user (output stage) never sees the original. See Actions.

Flag — observe only

Records a match and changes nothing about the traffic. The right starting point for a new pattern: ship it as flag, watch the Matches feed, then promote to mask/block once you trust it.

Annotate — attach a note

Records a match and attaches a note (e.g. a finding to surface in triage) without altering the traffic. See Actions.

Spotlight — wrap as untrusted data

An input-stage defense: each match is wrapped in delimiters (e.g. ⟦UNTRUSTED⟧…⟦/UNTRUSTED⟧) that tell the model to treat the text as data, not instructions — a prompt-injection mitigation. See Actions.

7. See what fired — and tune precision

Every rule that fires records a match — rule type, action, stage, and a detail string — in the workspace Matches feed.

The matched substring is recorded only when Log raw content is on, which is off by default — the privacy-conservative posture. With it off you still see that a regex rule fired and how often, just not the literal text it caught. Turn it on per guardrail when you need the substring for triage; the setting is non-retroactive. See Matches feed and Logging & privacy.

A too-broad pattern is the classic regex pitfall — \d{8} matches every eight-digit run, not just your order numbers. Anchor it (a fixed prefix like ORD-, word boundaries \b), watch the Matches feed, and mark false positives to tighten as you go. For an A/B grid against a corpus — proving a pattern catches what it should without flagging benign traffic — the Eval harness lives one tab over. See Tune false positives.

8. Where to go next

Custom PII entities

When the shape is an identity you want a typed mask tag or a Luhn checksum for — not a bare [REDACTED].

Sensitive words

A finite list of literal terms — simpler than a pattern when you don’t need structure.

Actions

How block, mask, and flag differ and when to use each.

Guardrails reference

The complete engine — every rule type, field, and route.

A regex rule governs content. To govern an agent’s tool calls — deny destructive actions, redact tool-call arguments, require approval — use the Firewall and its rule matchers. For fuzzy policies no pattern can express (toxicity, off-topic, injection intent), an llm_judge rule runs a semantic check against a workspace model. To see where regex fits in the overall design, read Guardrails vs Firewall.

​1. When you need a regex guardrail llm control

Structured codes

Format-shaped tokens

Output leak patterns

Cheap, deterministic checks

​2. RE2 — linear-time, no backreferences

​3. Anatomy of a regex rule

​4. One concrete example

​5. Stage and streaming coverage

​6. Pick an action

​7. See what fired — and tune precision

​8. Where to go next

Custom PII entities

Sensitive words

Actions

Guardrails reference

1. When you need a regex guardrail llm control

2. RE2 — linear-time, no backreferences

3. Anatomy of a regex rule

4. One concrete example

5. Stage and streaming coverage

6. Pick an action

7. See what fired — and tune precision

8. Where to go next