1. Brand safety ai in one preset
The Brand category in the guardrail template picker is a set of keyword denylists. Each preset is a singlekeyword rule you apply
in one click and then edit — swap the seed terms for your own list. There
is no model call, no network hop, and no SDK change: the policy lives in
the gateway, and your app keeps calling /v1/chat/completions exactly as
before.
Profanity
A denylist that blocks swearing or banned terms on the request —
or a mask variant that redacts them instead.
Competitor mentions
Blocks (or flags) any mention of names you list — keep a copilot
from talking up the competition.
Child safety
A conservative denylist for child-safety terms you populate from your
own standards, blocked on the request.
2. The Brand presets, exactly as shipped
Open the New guardrail split-button in the console Guardrails view and pick the Brand template category. Five seeds live there:Profanity / Brand Safety (block)
Profanity / Brand Safety (block)
A single
keyword rule, stage input, action block. Ships
with placeholder terms — edit the list to your real banned words,
competitor names, or off-limits phrases. A match returns HTTP 400
guardrail_blocked before the prompt leaves the gateway.Profanity Filter (mask)
Profanity Filter (mask)
Same denylist, but action mask and stage both — denylisted
words are replaced with
[REDACTED] instead of rejecting the call.
The softer alternative when you want the request to go through
cleaned rather than refused.Profanity Multilingual
Profanity Multilingual
A
keyword block rule seeded with per-market placeholders (zh, es,
fr, de, ja, ar). Replace each with the region-specific terms your
policy bans — the seed terms are deliberately generic.Competitor Mentions
Competitor Mentions
A
keyword rule, stage input, action block, seeded with a
single placeholder. Add your competitor names; switch the action to
flag to monitor mentions without rejecting traffic.Child Safety Keywords
Child Safety Keywords
A conservative
keyword denylist, stage input, action block.
The seed is an intentional placeholder — populate it with the exact
terms from your own safety policy or standards before you rely on it.A preset is a seed, not a lock. Every Brand preset ships with
placeholder terms so the rule is valid out of the box — you are expected
to edit the denylist for your brand before attaching a key. The presets
intentionally do not ship real banned-word or child-safety lists.
3. Apply a Brand preset in the console
Every step here is a console action under your own session. Creating and editing guardrails requires Developer+ in the workspace. Only the final/v1/* call uses an sk-orca-... relay key.
Open the template
In the console, open Guardrails, click the New guardrail
split-button, and pick Competitor Mentions (or any Brand preset)
from the Brand template category.
Edit the denylist
Replace the seed placeholder with your real terms — e.g. your
competitors’ names. Give the guardrail a name (≤ 64 chars), like
brand-safety, and save.Test it
Open the Test tab, paste a sample at the
input stage, and run
the policy locally — no upstream call, no quota (see
§5).Attach a key
Edit an API key and pick
brand-safety from the Guardrail
dropdown (sets guardrail_id on the key), or mark it the workspace
default. See Attach to a key
and Account default.4. One concrete example
A competitor-mention guardrail namedbrand-safety is attached to a key.
The seed placeholder has been replaced with the real name Acme. Call
the gateway exactly as before — no new headers:
keyword rule matches Acme on the request, and the gateway rejects
the call with HTTP 400 guardrail_blocked — naming the guardrail and
rule that fired — before anything reaches the upstream model.
Prefer mask over block for profanity when you’d rather clean the
prompt than refuse it — denylisted words render to [REDACTED] and the
request goes through. Prefer flag for competitor mentions when you
want to measure exposure before you start blocking. The
Actions page covers the full block / mask
/ flag trade-off.
5. Test before you attach
Prove the denylist does what you expect before any key points at it. Open the Test tab inside the editor, paste a sample, pick theinput
stage, and run:
6. See what fired
Every rule that fires records a match — rule type, action, stage, and a detail string — surfaced in the workspace Matches feed (GET /api/guardrail/match, Member). The matched substring itself (the
banned word, the competitor name) is recorded only when Log raw
content is on, which is off by default.
For a child-safety denylist, leaving Log raw content off is usually
the point: you get to see that a term was blocked and how often without
copying the term back into your own telemetry. Turn it on per guardrail
only when you need the substring for triage; the setting is
non-retroactive. See Matches feed
and Logging & privacy.
7. Where to go next
Sensitive-word filters
The keyword-denylist mechanics behind every Brand preset, in depth.
Block secrets
Catch API keys and credentials with the Secrets Blocker preset.
Tune false positives
Mark false positives and tighten denylists from the Matches feed.
Templates
The full preset library across every category.
