https://api.orcarouter.ai/v1 exactly as before.
1. The threat: agents act, not just chat
Prompt-level safety was built for chat. It assumes the model produces text and a human reads it. Agents break that assumption:- They ingest untrusted content — a web page, a retrieved document, a tool result — that can carry instructions (prompt injection).
- They call tools —
shell.exec,db.query, a payment API — that do irreversible things. - They reach the network — fetching URLs an attacker can steer toward internal services or exfiltration endpoints.
- They self-extend — installing skills, plugins, and MCP servers you never reviewed.
2. The control stack
OrcaRouter applies four layers to every request. Each is independent, each is workspace-scoped, and each attaches to an API key with no code change.Scoped keys
Least-agency identity. Bound to specific models, IPs, a spend cap, an
expiry, and the exact guardrail + firewall policy that applies.
Guardrails
Content control. Screen prompts and responses — block, mask, or flag
PII, secrets, injection, and unsafe output.
Agent Firewall
Action control. Allow-list tools, validate and sanitize tool-call
arguments, hold for approval, and cap egress and cost.
Audit
Attribution. Every match, verdict, and approval is logged and
correlated to the agent run that caused it.
3. Why “zero trust”
Zero trust means no request is trusted because of where it came from. A tool call is judged on what it is, not on the fact that your own agent issued it — because the agent may be acting on injected instructions it read from an untrusted page. OrcaRouter enforces this by default-deny on the actions that matter and explicit allow-lists for the ones you intend. Why AI agents need zero trust covers the model in depth.4. Everything lives in the gateway
The control stack is configured in your workspace and enforced at the gateway, not in your application:- Attach once, applies everywhere. Bind a guardrail and a firewall policy to an API key; every call that key makes is screened. Edit the policy and every attached key shifts on the next request.
- No redeploy, no SDK change. Your agent keeps issuing the same OpenAI-shaped calls. Enforcement is invisible until a rule fires.
- Provider-agnostic. The same policy rides over GPT, Claude, Gemini, and the rest — it screens text and actions, not the model choice.
Configuration is role-gated inside your workspace. Reading policies
and settings is open to any member; the firewall Events and Runs
feeds require the Developer role; creating or changing guardrails,
firewall policies, and keys requires Developer; compliance and
gateway-key changes require Admin. Throughout these docs, each
configuration step notes the role it needs.
5. The fast path: one switch
You do not have to author rules to get protected. An autonomy level sets your whole Firewall and Guardrails posture in a single step, with one-click undo:| Level | What you get |
|---|---|
tight | Default-deny; blocks destructive tools and SSRF egress; PII + secrets guardrails on. |
balanced | Audit by default, deny destructive shell, flag PII. The recommended starting posture. |
permissive | Nothing enforced, but everything observed so you still see your agent’s behavior. |
6. Where to go next
Quickstart
Turn on zero trust in 5 minutes.
Why zero trust
The threat model behind the design.
Guardrails vs. Firewall
Which layer catches which threat.
What you're responsible for
What the gateway secures, and what stays yours.
