1. The llm cost guardrail use case
The lever is one built-in rule type:max_chars. It caps the
character count of the text at a stage. No model call, no network hop —
a deterministic length check that runs on the request before metering,
or on the response after the model returns.
Two shapes, picked by the rule’s action:
Block oversized requests
On a request
max_chars rule with action block, any prompt over
the limit is rejected with HTTP 400 guardrail_blocked — and a
blocked request costs no quota, because the block fires before
usage is metered.Clamp oversized responses
On a
max_chars rule with action mask, the text is truncated to
the limit instead of rejected — the caller still gets a usable
answer, just bounded. Useful on the response stage to cap egress.The cap counts characters (rune-aware —
日本語 is three, not nine),
not tokens. The shipped token-oriented preset translates a token budget
into a character ceiling at the standard char→token ratio; tighten the
rule’s max_chars field directly for a stricter budget.2. The shipped cost presets
Open the New guardrail split-button in the console and pick from the cost template category. Three presets seed a singlemax_chars rule
each:
| Preset | Stage · action | Cap |
|---|---|---|
| Prompt-Size Cap | input · block | 50,000 chars |
| Token Cost Cap (prompt) | input · block | 200,000 chars (~50K tokens) |
| Response Size Cap | output · block | 32,000 chars |
max_chars value, stage, or action to fit your budget. Authoring and
editing guardrails requires Developer+ in the workspace.
3. Author your own cap
A cost rule is the simplest rule in the engine — a stage, an action, and an integer. To cap requests at 20,000 characters and reject anything larger:max_chars must be a positive
integer; the validator rejects 0 or negative values.
4. Test before you attach
Prove the cap fires where you expect before any key points at it. Open the Test tab inside the guardrail editor, paste a sample, pick theinput stage, and run the current policy locally — no upstream call,
no quota. An over-limit sample returns a blocked verdict; an under-limit
sample passes through untouched.
For a clamp rule, the sandbox shows the truncated rendered text, so you
can confirm the cap lands on a rune boundary before depending on it.
5. Attach the cap to a key
A cost guardrail resolves exactly like any other — attach it to an API key, or set it as the workspace default. Every step here is a console action under your own session.Save the guardrail
Create or open a guardrail in the console, add a
max_chars rule
(or apply a cost preset), and save.Attach a key
Edit an API key and pick the guardrail from the Guardrail
dropdown (sets
guardrail_id on the key), or mark the guardrail the
workspace default. See
Attach to a key and
Account default.6. What a blocked request costs
A request-stage cap is the cheapest guardrail to enforce: it runs before usage is metered, so an oversized prompt is rejected at zero quota cost.Does a blocked oversized request cost quota?
Does a blocked oversized request cost quota?
No. An input-stage block fires before metering. An output-stage block
refunds the pre-consumed quota after the response is rejected. Either
way the caller pays no quota, gets HTTP 400
guardrail_blocked, and
the request is marked skip-retry — re-running the same oversized
prompt would just block again. See the
guardrail_blocked error.Is the response cap enforced on streaming?
Is the response cap enforced on streaming?
A
max_chars block on the output stage is enforced both ways: on
a non-streaming response the answer is screened before it returns, and
on a streaming response a scanner cuts the stream mid-flight once the
buffer crosses the cap. A mask (clamp) on output currently applies
to non-streaming responses only. See
Streaming coverage.Does a cost rule show the matched text in the feed?
Does a cost rule show the matched text in the feed?
No. A
max_chars rule has no substring concept, so the
Matches feed records that the
cap fired — its type, action, and stage — but never a matched
substring, even with Log raw content on. You get the that it
fired signal without re-capturing the oversized payload.7. Where this fits
Amax_chars cap is a blunt cost lever — a hard ceiling, not a per-key
spend budget. To cap dollars rather than characters, set
credit_limit_usd on the API key itself (0 = unlimited), which the
gateway enforces independently of any guardrail. The two stack: the key
budget bounds total spend, the cost guardrail bounds the size of any
single request or response.
8. Where to go next
Input-stage rules
How request screening runs before the upstream call and before
metering.
Output-stage rules
Screening and clamping the model’s response, streaming and not.
The guardrail_blocked error
The HTTP 400 shape, the no-quota guarantee, and skip-retry.
Test & eval
Prove a cap against a corpus before you attach a key.
