Harden an MCP-based agent

An MCP agent is an agent with reach. Every Model Context Protocol server it connects to is a fresh set of tools, credentials, and network destinations that nobody reviewed — and the agent can pull in a new one mid-run. This recipe shows the four moves that turn a sprawling MCP setup into a governed one on the hosted gateway: a single audited MCP gateway, skill quarantine, egress denial, and encrypted server auth. You configure all of it from the console (or the REST API) against your workspace. Your agent keeps speaking MCP exactly as before.

1. Why secure an MCP agent

Point an agent at five MCP servers directly and you have five trust boundaries, five credential stores, and zero shared audit trail. The tools/call that reads a customer record and the one that runs a shell command look identical to the model, and a community server can quietly request shell.exec and an external network scope the first time it loads. The fix is to make OrcaRouter the one choke point every call crosses. To secure mcp agent traffic end to end you route all MCP dispatch through the Firewall’s MCP gateway, so every tools/call is policy-evaluated before it reaches the real server — with skills risk-scored, egress governed, and credentials encrypted at rest.

This is a recipe — it stitches existing features into one concrete hardening pass. For the full reference, follow the links into Firewall, MCP servers, and Skills.

2. Start from the Secure Agents baseline

Before authoring anything bespoke, set a posture. In the console open Firewall → Posture and apply the balanced autonomy level (Developer role). In one transaction it audits tool calls and flags PII while denying the most destructive actions — you watch before you broadly enforce, with one-click undo. When the Events and Runs feeds look right, move to tight: default-deny, destructive shell denied, SSRF-shaped egress denied, plus the PII Shield and Secrets Blocker guardrails enforced. That single switch is the floor this recipe builds on.

Prefer to ramp without flipping the whole workspace? Author the rules below into one named policy and turn on its shadow mode — it evaluates and logs but downgrades every enforcing verdict to audit (reason prefixed [shadow] would …) until you’re sure. See enforcement modes.

3. Route every tools/call through one MCP gateway

Register each MCP server once; the gateway aggregates their tools under a single connection (namespaced <server>.<tool>) and runs every tools/call through the firewall engine. Register a server from the console (or the REST API, Developer+):

curl https://api.orcarouter.ai/api/workspace/firewall/mcp_servers \
  -H "Authorization: Bearer <your-session-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "github",
    "endpoint": "https://api.githubcopilot.com/mcp",
    "auth_mode": "bearer",
    "auth_json": "{\"token\":\"ghp_x\"}",
    "enabled": true
  }'

Then point your MCP client at the gateway — not at the upstream servers — using a dedicated firewall-gateway-scoped key:

https://api.orcarouter.ai/api/v1/firewall/mcp

Now github.create_issue and shell.exec show up side by side under one connection, and each dispatch is evaluated before it runs. A blocked call comes back to the model as a tool error (firewall deny: …), not a transport crash, so the agent can adapt.

A regular relay key gets 403 on the gateway route /api/v1/firewall/mcp. Mint a dedicated gateway token (is_firewall_gateway) for the MCP connection; reading that gateway key’s plaintext requires Admin+.

Before you can write rules against a server’s tools, probe it to discover their names and schemas:

curl -X POST \
  https://api.orcarouter.ai/api/workspace/firewall/mcp_servers/42/probe \
  -H "Authorization: Bearer <your-session-token>"

4. Quarantine skills the agent pulls in

The MCP gateway governs calls; skill governance governs the capabilities an agent loads. Every installable skill, BYO MCP server, or plugin is scanned into a risk band and an enforcement mode that rides on top of every rule verdict:

Mode	Effect at runtime
`allow`	Rule verdicts decide; the skill adds nothing.
`quarantine`	Anything short of deny is held for `pending_approval`.
`block`	The skill’s tools are force-denied.

The point for an MCP agent: a capability nobody approved doesn’t get a free pass. When an agent self-installs something and its tools first cross the gateway, the Firewall auto-detects it and quarantines it until a human reviews it — even if it scanned clean. Pre-approve the servers you trust; let the rest land in the review queue.

Keep balanced/observe on while you learn what your agent actually installs, then promote the trusted skills to allow and leave the long tail quarantined. See Skills.

5. Deny SSRF-shaped egress

A compromised or confused MCP tool reaching for cloud-metadata or an intranet host is the classic exfiltration path. Two layers cover it. First, the gateway validates every remote MCP endpoint and its resolved dial IP against an SSRF policy on registration and on each dispatch hop — intranet ranges and the cloud-metadata address are refused, re-checked to defeat DNS rebinding. That’s built in; you don’t configure it. Second, the tight autonomy level ships an SSRF egress preset that denies fetch-shaped tool names — http_fetch, web_search, fetch_url, request, and their <server>.* namespaced forms — so a tool whose whole job is “go fetch this URL” is stopped before it dials. To govern where tools may reach by destination, author your own egress rule with a host/CIDR deny list — that’s the surface for pinning outbound reach:

// firewall rule, egress stage — deny outbound to an internal range.
// egress_json is a JSON *string*: {"deny":[…],"allow":[…]} of hosts/CIDRs.
{
  "stage": "egress",
  "verdict": "deny",
  "egress_json": "{\"deny\":[\"10.0.0.0/8\",\"169.254.169.254\"]}"
}

No preset ships CIDR egress rules — the SSRF preset matches tool names, not destinations. Author the host/CIDR deny list yourself when you need destination-level control. See egress lists and stop exfiltration.

6. Keep server credentials encrypted

Every MCP server’s auth_json is encrypted at rest and masked on read; the gateway injects credentials at dispatch time, so they never reach the model or the client. Supported auth_mode values:

bearer

{ "token": "…" } — a static bearer token, sent as Authorization: Bearer.

oauth

{ "client_id": "…", "client_secret": "…", "token_url": "…" } — client-credentials OAuth; the gateway fetches and refreshes the token.

basic

{ "username": "…", "password": "…" } — HTTP Basic auth.

none

"" — an unauthenticated server. The default.

On read the secret is masked; echo the mask back on update to keep the stored value. The server’s status (ok / degraded / down) from the last probe tells you whether it’s reachable before you depend on it.

7. Add a content guardrail on the request

The Firewall governs actions; pair it with a Guardrail so the text moving through your MCP agent is screened too. The Secrets Blocker preset catches credentials in a request before the model — or any tool — ever sees them, and a PII Shield masks identifiers on the way in. Both come on with the tight autonomy level, or attach a named guardrail to the agent’s relay key via guardrail_id.

The firewall’s sanitize verdict redacts tool-call arguments, never the content a tool returns. Strip secrets from the request with the Secrets Blocker guardrail; sanitize the arguments an agent emits with a firewall rule. They cover different halves of the flow.

8. Verify and watch

Confirm the policy does what you expect before you trust it, then keep an eye on the feeds:

Test a tool call

Dry-run a sample tools/call against your policy and see the verdict, the matched rule, and the reason — nothing dispatched, nothing logged.

Discovered tools

Every tool the workspace has seen, flagged covered or gap — author rules straight from real MCP traffic.

Events & Runs

Every dispatch, its verdict, and the surface it hit, rolled up per agent run.

Anomaly feed

Rate/cost spikes, retry loops, and novel tool paths against a learned baseline.

9. Where to go next

MCP tool poisoning

The threat model behind quarantine and the MCP gateway.

Excessive agency

Why default-deny and HITL matter for autonomous tool use.

Autonomous agent recipe

Harden a high-autonomy agent end to end.

Stop exfiltration

Lock down outbound egress in depth.

​1. Why secure an MCP agent

​2. Start from the Secure Agents baseline

​3. Route every tools/call through one MCP gateway

​4. Quarantine skills the agent pulls in

​5. Deny SSRF-shaped egress

​6. Keep server credentials encrypted

​7. Add a content guardrail on the request

​8. Verify and watch

Test a tool call

Discovered tools

Events & Runs

Anomaly feed

​9. Where to go next

MCP tool poisoning

Excessive agency

Autonomous agent recipe

Stop exfiltration

1. Why secure an MCP agent

2. Start from the Secure Agents baseline

3. Route every tools/call through one MCP gateway

4. Quarantine skills the agent pulls in

5. Deny SSRF-shaped egress

6. Keep server credentials encrypted

7. Add a content guardrail on the request

8. Verify and watch

9. Where to go next