Everything here binds to your workspace and is configured from the
console. Your agent keeps calling
https://api.orcarouter.ai/v1/... with
the same sk-orca-... key — only the policy in the gateway changes.
Configuration actions need the roles called out per step; relay calls use
the scoped key. The firewall sees egress only for destinations routed
through the gateway (the MCP dispatch path or the evaluate hook) — route
your network-bound tool calls through it and they are governed.1. The three layers that prevent ai data exfiltration
Each layer catches the attack at a different point in the request lifecycle. Stack all three — they’re independent and complementary.Credentials in the prompt
A secret pasted into (or pulled into) the request is caught at the
input stage by the Secrets Blocker guardrail — before any model
sees it.
Secrets in tool args
A model that emits a tool call carrying a credential is cleaned by a
sanitize firewall rule, which redacts the matched argument.
Outbound destination
The actual network step is bounded by an egress allow-list — only
enumerated hosts pass; everything else is denied.
2. Stop credentials at the prompt — the Secrets Blocker guardrail
The first thing to lock down is the credential itself. The Secrets & API-Key Blocker guardrail runs at the input stage and scans the request for credential patterns — AWS-style access keys, OpenAI keys, JWTs, and similar tokens — before the request leaves the gateway. On a match the request is blocked: the credential never reaches a model and never lands in a tool call. In the console, open Guardrails → New guardrail (the Developer role; reads and the Test sandbox are open to any member), name itexfil-shield, and apply the Secrets & API-Key Blocker preset from
the template library (category secrets). The preset seeds three
input-stage regex block rules, one per credential shape — AWS access
keys, OpenAI-style keys, and GitHub tokens:
guardrail_blocked, costs no
quota (an input-stage block fires before metering), and is marked
skip-retry. Prove it in the Test tab — paste a sample AWS key, pick
the input stage, and confirm the verdict — before you attach a key.
3. Sanitize secrets out of tool-call arguments
A guardrail screens the prompt; it doesn’t see the tool calls a model emits. When the model produces atool_call whose arguments carry a
credential, a firewall sanitize rule catches it. Sanitize redacts
the matched substrings from the tool-call arguments and forwards the
cleaned call — the tool runs, but with the secret stripped out.
In Firewall → Policies → New policy (Developer role), name it
exfil-firewall and add a sanitize rule on the response surface — the
tool_calls the model emits in its reply:
4. Lock outbound destinations — the egress allow-list
The most durable defense is the network boundary itself: enumerate the hosts your agents are legitimately allowed to reach and deny everything else. An egress rule usesstage: egress and the egress field; the
verdict sets polarity — allow passes listed destinations and a
lower-priority deny catch-all blocks the rest.
Add these rules to the same exfil-firewall policy:
169.254.169.254) and the RFC-1918 private ranges (10.0.0.0/8,
172.16.0.0/12, 192.168.0.0/16). A denied call returns HTTP 400
firewall_blocked.
No preset ships CIDR egress rules — you author the host/CIDR allow and
deny entries yourself. The
tight autonomy level
is the adjacent fast path: it denies the fetch-shaped tool names
(http_fetch, web_search, fetch_url, request) outright, removing
the network capability before a destination is ever evaluated. Use it
when your agent doesn’t need those tools at all.5. Attach one scoped key
A policy only enforces on keys that resolve to it. Give the agent its own key, scoped to the minimum it needs — never your account-wide key. In API Keys → New key (Developer role):Attach both policies
Attach both policies
Pick
exfil-shield from the Guardrail dropdown (sets
guardrail_id) and exfil-firewall from the Firewall policy
dropdown (sets firewall_policy_id). Both bindings live on the key in
the gateway. An explicit guardrail attachment never silently falls
back — disabling it is the off switch. A disabled firewall policy,
by contrast, falls back to the workspace default policy.Cap the blast radius
Cap the blast radius
Set
credit_limit_usd to a sane ceiling (0 = unlimited) so a
compromised key can’t drain quota, and allow_ips to your backend’s
egress IPs if the agent calls from a fixed server. Set an
expired_time for temporary keys (-1 = never expires).exfil-shield and every tool call through
exfil-firewall with no code aware that enforcement is happening.
6. Roll out with shadow mode, then watch
If you don’t yet know every host your agent legitimately reaches, don’t enforce blind — observe first. See enforcement modes for the full observe → shadow → enforce path.Shadow the egress rules
Set
shadow_mode: true on exfil-firewall. Every enforcing verdict
is downgraded to audit and logged as [shadow] would deny with the
destination. No traffic is blocked while shadow mode is on.Watch the feeds
Firewall → Events / Runs (Developer+) shows every tool call and
egress destination your agent hit and what would have been denied.
Guardrails → Matches (any Member) shows every secret the input
guardrail caught. Tune the egress
allow list until only
attacker-reachable hosts would be denied.The Matches feed records the matched substring only when Log raw
content is on for the guardrail (off by default — the
privacy-conservative posture). Mark a false positive (Admin) to tune
the policy. Every guardrail change writes a version-history row you can
diff and revert; firewall policy changes are recorded in the audit trail.
7. Coverage at a glance
| Exfiltration step | Layer that stops it |
|---|---|
| Credential enters the request | Secrets Blocker guardrail (input) |
| Model emits a tool call carrying a secret | sanitize firewall rule (response surface) |
| Tool dials an attacker host | Egress allow / deny rule |
| Agent reaches cloud metadata or RFC-1918 | Egress deny rule listing those CIDRs |
| Fetch-shaped tool offered to the model | tight autonomy level (tool-name deny) |
8. Where to go next
Firewall rules reference
The full matching language — egress lists, CIDRs, sanitizers, and all
verdicts.
Data exfiltration threat
The attack anatomy this recipe defends against, end to end.
Harden an MCP agent
Govern every
tools/call an agent dispatches through an MCP server.PII-safe logging
Keep sensitive data out of your request logs and the Matches feed.
