Securing AI agents with OrcaRouter

An AI agent is not a chatbot. It reads untrusted web pages, calls tools, spends money, reaches internal hosts, and loads capabilities it found at runtime. Every one of those is an action with real-world consequences, and most of them happen without a human in the loop. OrcaRouter sits on the path between your agent and every model it calls, so it is the one place that sees every request and response — and every tool call and outbound destination your agent routes through it — regardless of which provider served it. That choke point is where zero-trust enforcement belongs. You configure it once in your workspace; your agent keeps calling https://api.orcarouter.ai/v1 exactly as before.

1. The threat: agents act, not just chat

Prompt-level safety was built for chat. It assumes the model produces text and a human reads it. Agents break that assumption:

They ingest untrusted content — a web page, a retrieved document, a tool result — that can carry instructions (prompt injection).
They call tools — shell.exec, db.query, a payment API — that do irreversible things.
They reach the network — fetching URLs an attacker can steer toward internal services or exfiltration endpoints.
They self-extend — installing skills, plugins, and MCP servers you never reviewed.

None of that is visible to a content filter that only reads the prompt. Securing an agent means controlling identity, content, actions, and the network, and keeping an audit trail of all of it.

2. The control stack

OrcaRouter applies four layers to every request. Each is independent, each is workspace-scoped, and each attaches to an API key with no code change.

Scoped keys

Least-agency identity. Bound to specific models, IPs, a spend cap, an expiry, and the exact guardrail + firewall policy that applies.

Guardrails

Content control. Screen prompts and responses — block, mask, or flag PII, secrets, injection, and unsafe output.

Agent Firewall

Action control. Allow-list tools, validate and sanitize tool-call arguments, hold for approval, and cap egress and cost.

Audit

Attribution. Every match, verdict, and approval is logged and correlated to the agent run that caused it.

A request flows through them in order: the key decides whether the call is even allowed and which policies bind; guardrails screen the input text; the model runs; the firewall judges any tool calls and outbound destinations; guardrails screen the output; and every decision lands in the audit trail. See The control stack for the full path.

3. Why “zero trust”

Zero trust means no request is trusted because of where it came from. A tool call is judged on what it is, not on the fact that your own agent issued it — because the agent may be acting on injected instructions it read from an untrusted page. OrcaRouter enforces this by default-deny on the actions that matter and explicit allow-lists for the ones you intend. Why AI agents need zero trust covers the model in depth.

4. Everything lives in the gateway

The control stack is configured in your workspace and enforced at the gateway, not in your application:

Attach once, applies everywhere. Bind a guardrail and a firewall policy to an API key; every call that key makes is screened. Edit the policy and every attached key shifts on the next request.
No redeploy, no SDK change. Your agent keeps issuing the same OpenAI-shaped calls. Enforcement is invisible until a rule fires.
Provider-agnostic. The same policy rides over GPT, Claude, Gemini, and the rest — it screens text and actions, not the model choice.

Configuration is role-gated inside your workspace. Reading policies and settings is open to any member; the firewall Events and Runs feeds require the Developer role; creating or changing guardrails, firewall policies, and keys requires Developer; compliance and gateway-key changes require Admin. Throughout these docs, each configuration step notes the role it needs.

5. The fast path: one switch

You do not have to author rules to get protected. An autonomy level sets your whole Firewall and Guardrails posture in a single step, with one-click undo:

Level	What you get
`tight`	Default-deny; blocks destructive tools and SSRF egress; PII + secrets guardrails on.
`balanced`	Audit by default, deny destructive shell, flag PII. The recommended starting posture.
`permissive`	Nothing enforced, but everything observed so you still see your agent’s behavior.

This is the Secure Agents baseline — start there, watch what your agents actually do, then tighten.

6. Where to go next

Quickstart

Turn on zero trust in 5 minutes.

Why zero trust

The threat model behind the design.

Guardrails vs. Firewall

Which layer catches which threat.

What you're responsible for

What the gateway secures, and what stays yours.

Why zero trust

​1. The threat: agents act, not just chat

​2. The control stack

Scoped keys

Guardrails

Agent Firewall

Audit

​3. Why “zero trust”

​4. Everything lives in the gateway

​5. The fast path: one switch

​6. Where to go next

Quickstart

Why zero trust

Guardrails vs. Firewall

What you're responsible for

1. The threat: agents act, not just chat

2. The control stack

3. Why “zero trust”

4. Everything lives in the gateway

5. The fast path: one switch

6. Where to go next