Skip to main content
Security checks only matter if they actually run — but you shouldn’t have to trade throughput for safety. This page answers the question developers ask most: will enforcement slow down my agent, and by how much? The short answer: built-in rules cost nothing measurable. Advanced rules cost at most one bounded, concurrent, fail-open model call. Here is why, and how to control it.

1. Two classes of check

Every guardrail rule and every firewall evaluation falls into one of two classes.

Built-in / deterministic checks

Keyword denylist (keyword), regular expression (regex), PII detection (pii), and max-length (max_chars) guardrail rules are pure local string and regex operations — no model call, no network hop, nothing that can time out. Firewall rule evaluation (tool-name glob matching, argument predicates, egress scope) is the same: deterministic and local. For practical purposes, these checks add negligible latency to your request. They are safe to run on the hot path and are what the built-in guardrail templates are made of.

Advanced / semantic checks

llm_judge, grounding, and external vendor rules delegate the check to a model or a vendor. They do cost a round-trip. Three properties bound that cost:
  1. Concurrent dispatch. If your policy has multiple advanced rules, they are dispatched in parallel — one slow check never serializes behind another.
  2. Per-rule timeout. Each advanced rule has a timeout (judge_timeout_ms / grounding_timeout_ms / timeout_ms). The grounding check defaults to ~3 000 ms; the judge uses a configurable value (0 → engine default). The rule is bounded — it cannot hang indefinitely.
  3. Fail-open by default. When a rule times out or its vendor returns an error, the event is recorded but the request continues. Set judge_fail_open: false (judge) or fail_open: false (external) to flip to fail-closed instead.
So the worst case for any number of advanced rules is the longest single timeout, not the sum of all timeouts.

2. At a glance

Check typeAdds latency?How it’s bounded
keyword denylistNegligible — local string scanNo network; no timeout needed
regexNegligible — RE2 local matchNo network; no timeout needed
pii detectionNegligible — local regex/entity scanNo network; no timeout needed
max_charsNegligible — character countNo network; no timeout needed
Firewall rule evaluationNegligible — glob + predicate matchingNo network; no timeout needed
llm_judgeOne bounded model calljudge_timeout_ms; fail-open by default
groundingOne bounded model callgrounding_timeout_ms (default ~3 000 ms); fail-open by default
external vendorOne bounded vendor calltimeout_ms; fail_open by default
Multiple advanced rulesOne bounded round-trip (concurrent dispatch)Worst case = max single timeout, not sum

3. Where in the request lifecycle checks run

Enforcement doesn’t all happen at the same point. Input and output screening add time at different places:
Client


[Input guardrail screening]     ← adds time HERE, before upstream


Upstream model call


[Output guardrail screening]    ← adds time HERE, after model responds


Client
Input guardrails run before the upstream model call. A built-in input rule adds negligible overhead up front. An advanced input rule (e.g. an llm_judge that checks for prompt injection) adds a bounded model call before the main model call starts. Output guardrails run after the model responds. A built-in output rule adds negligible overhead to the tail. An advanced output rule (e.g. grounding checking RAG faithfulness) adds a bounded call after you already have the model’s answer. Firewall rule evaluation is deterministic and happens inline on tool-call routing — negligible, as noted above.
A blocked request costs no model tokens and adds no upstream latency for input-stage blocks. An input block fires before metering and before the upstream call, so you pay neither quota nor upstream round-trip time. An output-stage block refunds pre-consumed quota after the response is rejected.

4. How timeouts and fail-open cap the worst case

Advanced rules have two dials: Timeout — the maximum wall time the check is allowed. The request waits at most this long for that rule. Concurrent dispatch means this cap applies per-rule, not per-policy. If you have three llm_judge rules each with a 2 000 ms timeout, all three run at once and the total wait is ~2 000 ms, not ~6 000 ms. Fail-open vs fail-closed — what to do when the rule doesn’t complete in time (or the vendor errors):
SettingBehavior on timeout / error
fail_open: true (default)Record the event; let the request continue as if the check passed
fail_open: falseTreat the timeout / error as a block; return HTTP 400 guardrail_blocked
Fail-open preserves availability at the cost of a missed check. Fail-closed preserves the policy guarantee at the cost of availability if the judge is slow or unreachable. Choose based on what matters more for your use case.

5. Practical guidance

Keep hot-path rules built-in. If your primary concern is PII, credential leakage, prompt length, or keyword denylist — all of those are built-in rules. They add no measurable latency and should be your default choice for any check that text matching can handle. Use llm_judge and grounding where you need semantics. Toxicity, harassment, off-topic detection, prompt-injection intent, and RAG faithfulness are genuinely fuzzy — no regex captures them reliably. These are the right cases for an advanced rule. Accept that each adds a bounded extra model call. Tune timeouts to your latency budget. If your end-to-end target is 1 000 ms, set judge_timeout_ms: 800 (or less) so the judge can’t consume your entire budget. The engine’s default timeout is a safe starting point; lower it if you have tight requirements. For output grounding, the model call is already done. The grounding check runs after the upstream model responds — the extra latency is only in the tail, not on the critical path for time-to-first-token. This makes it a low-risk place to add semantic enforcement. Multiple advanced rules? Spread the work. Because advanced rules run concurrently, stacking three llm_judge rules costs roughly the same as one — the longest individual timeout determines the wall time, not the count. Use this to layer semantic checks without additive cost.

Enforcement modes

Fail-open vs fail-closed — the full reference for tuning your policy’s behavior under timeout and error conditions.

Guardrails

Rule types, judge fields, grounding thresholds, and the full guardrail configuration reference.
Built-in rules are negligible on every path; advanced rules cost one bounded, concurrent, fail-open call — tune the timeout and the fail mode, and enforcement adds no uncontrolled latency to your agents.