Enforcement path latency

Security checks only matter if they actually run — but you shouldn’t have to trade throughput for safety. This page answers the question developers ask most: will enforcement slow down my agent, and by how much? The short answer: built-in rules cost nothing measurable. Advanced rules cost at most one bounded, concurrent, fail-open model call. Here is why, and how to control it.

1. Two classes of check

Every guardrail rule and every firewall evaluation falls into one of two classes.

Built-in / deterministic checks

Keyword denylist (keyword), regular expression (regex), PII detection (pii), and max-length (max_chars) guardrail rules are pure local string and regex operations — no model call, no network hop, nothing that can time out. Firewall rule evaluation (tool-name glob matching, argument predicates, egress scope) is the same: deterministic and local. For practical purposes, these checks add negligible latency to your request. They are safe to run on the hot path and are what the built-in guardrail templates are made of.

Advanced / semantic checks

llm_judge, grounding, and external vendor rules delegate the check to a model or a vendor. They do cost a round-trip. Three properties bound that cost:

Concurrent dispatch. If your policy has multiple advanced rules, they are dispatched in parallel — one slow check never serializes behind another.
Per-rule timeout. Each advanced rule has a timeout (judge_timeout_ms / grounding_timeout_ms / timeout_ms). The grounding check defaults to ~3 000 ms; the judge uses a configurable value (0 → engine default). The rule is bounded — it cannot hang indefinitely.
Fail-open by default. When a rule times out or its vendor returns an error, the event is recorded but the request continues. Set judge_fail_open: false (judge) or fail_open: false (external) to flip to fail-closed instead.

So the worst case for any number of advanced rules is the longest single timeout, not the sum of all timeouts.

2. At a glance

Check type	Adds latency?	How it’s bounded
`keyword` denylist	Negligible — local string scan	No network; no timeout needed
`regex`	Negligible — RE2 local match	No network; no timeout needed
`pii` detection	Negligible — local regex/entity scan	No network; no timeout needed
`max_chars`	Negligible — character count	No network; no timeout needed
Firewall rule evaluation	Negligible — glob + predicate matching	No network; no timeout needed
`llm_judge`	One bounded model call	`judge_timeout_ms`; fail-open by default
`grounding`	One bounded model call	`grounding_timeout_ms` (default ~3 000 ms); fail-open by default
`external` vendor	One bounded vendor call	`timeout_ms`; `fail_open` by default
Multiple advanced rules	One bounded round-trip (concurrent dispatch)	Worst case = max single timeout, not sum

3. Where in the request lifecycle checks run

Enforcement doesn’t all happen at the same point. Input and output screening add time at different places:

Client
  │
  ▼
[Input guardrail screening]     ← adds time HERE, before upstream
  │
  ▼
Upstream model call
  │
  ▼
[Output guardrail screening]    ← adds time HERE, after model responds
  │
  ▼
Client

Input guardrails run before the upstream model call. A built-in input rule adds negligible overhead up front. An advanced input rule (e.g. an llm_judge that checks for prompt injection) adds a bounded model call before the main model call starts. Output guardrails run after the model responds. A built-in output rule adds negligible overhead to the tail. An advanced output rule (e.g. grounding checking RAG faithfulness) adds a bounded call after you already have the model’s answer. Firewall rule evaluation is deterministic and happens inline on tool-call routing — negligible, as noted above.

A blocked request costs no model tokens and adds no upstream latency for input-stage blocks. An input block fires before metering and before the upstream call, so you pay neither quota nor upstream round-trip time. An output-stage block refunds pre-consumed quota after the response is rejected.

4. How timeouts and fail-open cap the worst case

Advanced rules have two dials: Timeout — the maximum wall time the check is allowed. The request waits at most this long for that rule. Concurrent dispatch means this cap applies per-rule, not per-policy. If you have three llm_judge rules each with a 2 000 ms timeout, all three run at once and the total wait is ~2 000 ms, not ~6 000 ms. Fail-open vs fail-closed — what to do when the rule doesn’t complete in time (or the vendor errors):

Setting	Behavior on timeout / error
`fail_open: true` (default)	Record the event; let the request continue as if the check passed
`fail_open: false`	Treat the timeout / error as a block; return HTTP 400 `guardrail_blocked`

Fail-open preserves availability at the cost of a missed check. Fail-closed preserves the policy guarantee at the cost of availability if the judge is slow or unreachable. Choose based on what matters more for your use case.

5. Practical guidance

Keep hot-path rules built-in. If your primary concern is PII, credential leakage, prompt length, or keyword denylist — all of those are built-in rules. They add no measurable latency and should be your default choice for any check that text matching can handle. Use llm_judge and grounding where you need semantics. Toxicity, harassment, off-topic detection, prompt-injection intent, and RAG faithfulness are genuinely fuzzy — no regex captures them reliably. These are the right cases for an advanced rule. Accept that each adds a bounded extra model call. Tune timeouts to your latency budget. If your end-to-end target is 1 000 ms, set judge_timeout_ms: 800 (or less) so the judge can’t consume your entire budget. The engine’s default timeout is a safe starting point; lower it if you have tight requirements. For output grounding, the model call is already done. The grounding check runs after the upstream model responds — the extra latency is only in the tail, not on the critical path for time-to-first-token. This makes it a low-risk place to add semantic enforcement. Multiple advanced rules? Spread the work. Because advanced rules run concurrently, stacking three llm_judge rules costs roughly the same as one — the longest individual timeout determines the wall time, not the count. Use this to layer semantic checks without additive cost.

Enforcement modes

Fail-open vs fail-closed — the full reference for tuning your policy’s behavior under timeout and error conditions.

Guardrails

Rule types, judge fields, grounding thresholds, and the full guardrail configuration reference.

Built-in rules are negligible on every path; advanced rules cost one bounded, concurrent, fail-open call — tune the timeout and the fail mode, and enforcement adds no uncontrolled latency to your agents.

​1. Two classes of check

​Built-in / deterministic checks

​Advanced / semantic checks

​2. At a glance

​3. Where in the request lifecycle checks run

​4. How timeouts and fail-open cap the worst case

​5. Practical guidance

Enforcement modes

Guardrails

1. Two classes of check

Built-in / deterministic checks

Advanced / semantic checks

2. At a glance

3. Where in the request lifecycle checks run

4. How timeouts and fail-open cap the worst case

5. Practical guidance