1. Two classes of check
Every guardrail rule and every firewall evaluation falls into one of two classes.Built-in / deterministic checks
Keyword denylist (keyword), regular expression (regex), PII detection
(pii), and max-length (max_chars) guardrail rules are pure local
string and regex operations — no model call, no network hop, nothing
that can time out. Firewall rule evaluation (tool-name glob matching,
argument predicates, egress scope) is the same: deterministic and local.
For practical purposes, these checks add negligible latency to your
request. They are safe to run on the hot path and are what the built-in
guardrail templates are made of.
Advanced / semantic checks
llm_judge, grounding, and external vendor rules delegate the check
to a model or a vendor. They do cost a round-trip. Three properties bound
that cost:
- Concurrent dispatch. If your policy has multiple advanced rules, they are dispatched in parallel — one slow check never serializes behind another.
- Per-rule timeout. Each advanced rule has a timeout
(
judge_timeout_ms/grounding_timeout_ms/timeout_ms). The grounding check defaults to ~3 000 ms; the judge uses a configurable value (0 → engine default). The rule is bounded — it cannot hang indefinitely. - Fail-open by default. When a rule times out or its vendor returns
an error, the event is recorded but the request continues. Set
judge_fail_open: false(judge) orfail_open: false(external) to flip to fail-closed instead.
2. At a glance
| Check type | Adds latency? | How it’s bounded |
|---|---|---|
keyword denylist | Negligible — local string scan | No network; no timeout needed |
regex | Negligible — RE2 local match | No network; no timeout needed |
pii detection | Negligible — local regex/entity scan | No network; no timeout needed |
max_chars | Negligible — character count | No network; no timeout needed |
| Firewall rule evaluation | Negligible — glob + predicate matching | No network; no timeout needed |
llm_judge | One bounded model call | judge_timeout_ms; fail-open by default |
grounding | One bounded model call | grounding_timeout_ms (default ~3 000 ms); fail-open by default |
external vendor | One bounded vendor call | timeout_ms; fail_open by default |
| Multiple advanced rules | One bounded round-trip (concurrent dispatch) | Worst case = max single timeout, not sum |
3. Where in the request lifecycle checks run
Enforcement doesn’t all happen at the same point. Input and output screening add time at different places:llm_judge that checks for prompt injection) adds a bounded model call
before the main model call starts.
Output guardrails run after the model responds. A built-in output rule
adds negligible overhead to the tail. An advanced output rule (e.g.
grounding checking RAG faithfulness) adds a bounded call
after you already have the model’s answer.
Firewall rule evaluation is deterministic and happens inline on
tool-call routing — negligible, as noted above.
A blocked request costs no model tokens and adds no upstream latency
for input-stage blocks. An input block fires before metering and before
the upstream call, so you pay neither quota nor upstream round-trip time.
An output-stage block refunds pre-consumed quota after the response is
rejected.
4. How timeouts and fail-open cap the worst case
Advanced rules have two dials: Timeout — the maximum wall time the check is allowed. The request waits at most this long for that rule. Concurrent dispatch means this cap applies per-rule, not per-policy. If you have threellm_judge rules each
with a 2 000 ms timeout, all three run at once and the total wait is
~2 000 ms, not ~6 000 ms.
Fail-open vs fail-closed — what to do when the rule doesn’t complete
in time (or the vendor errors):
| Setting | Behavior on timeout / error |
|---|---|
fail_open: true (default) | Record the event; let the request continue as if the check passed |
fail_open: false | Treat the timeout / error as a block; return HTTP 400 guardrail_blocked |
5. Practical guidance
Keep hot-path rules built-in. If your primary concern is PII, credential leakage, prompt length, or keyword denylist — all of those are built-in rules. They add no measurable latency and should be your default choice for any check that text matching can handle. Usellm_judge and grounding where you need semantics. Toxicity,
harassment, off-topic detection, prompt-injection intent, and RAG
faithfulness are genuinely fuzzy — no regex captures them reliably. These
are the right cases for an advanced rule. Accept that each adds a bounded
extra model call.
Tune timeouts to your latency budget. If your end-to-end target is
1 000 ms, set judge_timeout_ms: 800 (or less) so the judge can’t consume
your entire budget. The engine’s default timeout is a safe starting point;
lower it if you have tight requirements.
For output grounding, the model call is already done. The grounding
check runs after the upstream model responds — the extra latency is only
in the tail, not on the critical path for time-to-first-token. This makes
it a low-risk place to add semantic enforcement.
Multiple advanced rules? Spread the work. Because advanced rules run
concurrently, stacking three llm_judge rules costs roughly the same as
one — the longest individual timeout determines the wall time, not the
count. Use this to layer semantic checks without additive cost.
Enforcement modes
Fail-open vs fail-closed — the full reference for tuning your policy’s
behavior under timeout and error conditions.
Guardrails
Rule types, judge fields, grounding thresholds, and the full guardrail
configuration reference.
