Skip to main content
An agent that can reach the network can be turned into a data pipe. An attacker plants instructions in a document the agent reads — a web page, a retrieved chunk, a tool result — and those instructions steer the agent to POST sensitive data to an attacker-controlled host, or to probe internal services (SSRF). The agent never “decides” to exfiltrate; it executes what looks, to it, like a legitimate instruction. This page covers how the Agent Firewall and Guardrails stack in OrcaRouter let you defend against ai data exfiltration — without changing your agent code.
The firewall sees egress only for destinations routed through the gateway via the MCP dispatch path or the evaluate hook. A tool your agent executes entirely inside its own process is outside its view. Route your agent’s network-bound tool calls through the gateway and they are governed.

1. How the attack works

The canonical path through an agent runs in three steps:
  1. Injection — the agent reads untrusted content carrying embedded instructions (a web page, a fetched document, a CRM note).
  2. Collection — the injected instructions tell the agent to gather sensitive material — API keys, database rows, user PII — using tools it already holds.
  3. Exfiltration — the agent is told to send that material out via a fetch-shaped tool: http_fetch, web_search, fetch_url, or request. The destination is attacker-controlled.
SSRF is the same shape redirected inward: instead of an external host the agent is steered toward 169.254.169.254 (cloud metadata), an internal Redis port, or another private service. See Prompt injection for the injection step; this page focuses on the network step.

2. Egress allow-list — lock outbound destinations

The most durable defense is an egress allow-list: enumerate the hosts your agents are legitimately allowed to reach and deny everything else. An egress rule uses stage: egress and the egress_json field — a JSON-encoded string holding the host/CIDR allow/deny list. The verdict controls polarity — allow passes listed destinations; a lower-priority deny catch-all blocks the rest:
[
  {
    "priority": 10,
    "label": "Allow known API endpoints",
    "stage": "egress",
    "tool_name_glob": "*",
    "verdict": "allow",
    "egress_json": "{\"allow\":[\"api.openai.com\",\"api.anthropic.com\",\"api.orcarouter.ai\"]}"
  },
  {
    "priority": 20,
    "label": "Deny all other outbound destinations",
    "stage": "egress",
    "tool_name_glob": "*",
    "verdict": "deny"
  }
]
Entries match as a CIDR, an IP literal, or a case-insensitive hostname. Hostnames are best-effort resolved and re-checked against IP/CIDR entries, so a destination like 169.254.169.254 returned by DNS is still caught by a 10.0.0.0/8 CIDR deny entry. A blocked call returns HTTP 400 with error code firewall_blocked. To deny known-bad ranges without an explicit allow list, write a targeted egress deny rule listing the cloud metadata endpoint (169.254.169.254) and the RFC-1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). Layer your allow-list on top at a lower priority number so the deny rules are evaluated first.

3. Block fetch-shaped tools at the name layer

Before an egress destination is even evaluated, you can remove the capability entirely. The tight autonomy level denies http_fetch, web_search, fetch_url, and request by tool-name glob as an SSRF and exfiltration backstop. If your agent doesn’t need any of those tools, tight removes the attack surface in one step:
POST /api/workspace/firewall/autonomy
{ "level": "tight" }
To deny fetch tools without adopting the full tight posture, write an inbound-surface deny rule. inbound blocks the tool before the model can choose it — the agent never receives the capability in its tool list:
{
  "priority": 5,
  "label": "Deny fetch-shaped tools",
  "stage": "inbound",
  "tool_name_glob": "http_fetch",
  "verdict": "deny"
}
Repeat for each fetch-shaped tool name your agent stack uses.

4. Secrets Blocker guardrail — stop credentials at the prompt

The Secrets Blocker guardrail runs at the input stage, scanning the prompt for AWS-style access keys, OpenAI keys, Anthropic keys, GitHub tokens, and similar credential patterns before the request leaves the gateway. If a secret is detected the request is blocked — the credential never reaches a model and never appears in a tool call. Enable it from the Guardrails panel, or as part of the tight autonomy level. It is independent of the firewall egress rules.
ThreatLayer that stops it
Prompt carries an API keySecrets Blocker (input guardrail)
Agent calls a fetch tool toward an attacker hostEgress allow/deny rule
Fetch-shaped tool advertised to the modelInbound deny rule or tight autonomy
Agent reaches cloud metadata or RFC-1918Egress deny rule listing those CIDRs

5. Roll out with shadow mode

If you’re not sure which hosts your agent legitimately reaches today, start in shadow mode before enforcing:
  1. Create the egress rules with your intended allow list and set shadow_mode: true on the policy.
  2. Watch the Events feed — calls that would be blocked appear as [shadow] would deny with the destination.
  3. Adjust the allow list until only attacker-reachable destinations would be denied, then disable shadow mode to start enforcing.
No traffic is blocked while shadow mode is on.

6. Where to go next

Firewall rules reference

Complete matching language — egress lists, CIDRs, argument clauses, and all verdicts.

Agent Firewall overview

Policies, surfaces, autonomy levels, and observability.

Prompt injection

The injection step that steers agents toward exfiltration.

MCP tool poisoning

Malicious MCP tools that register fetch-shaped capabilities.