ai supply chain security; the Firewall and
Skills references carry the full mechanics.
1. ai supply chain security for agents, at the gateway
The choke point is the relay path. Whether a capability was hand-registered, auto-installed by the agent, or pulled from a community registry, its first tool call crossesapi.orcarouter.ai — and that’s where the Firewall
evaluates it. Four controls compose into a single posture:
MCP gateway, per-call eval
Every
tools/call is evaluated against your policy before dispatch —
the manifest is never the source of truth.Skill risk-bands & quarantine
Installed capabilities are scanned, scored, and held for review until
a human approves them.
Encrypted MCP credentials
Server auth secrets are encrypted at rest and injected at dispatch —
never exposed to the model, the agent, or call arguments.
Egress allow-lists
Pin where tool calls may send data, so a compromised dependency can’t
exfiltrate to a host you never approved.
Detection is at the gateway, on first use — not in your package manager or
filesystem. That’s deliberate: it’s the one path that sees every agent
and every tool call regardless of how the capability got there.
2. The threat: a dependency that grows after you trust it
| Vector | What happens |
|---|---|
| Rug-pull | A registered MCP server adds a tool (shell.exec, a new fetch) you never approved. |
| Skill creep | An installed skill uses tools or hosts its manifest never declared. |
| Credential theft | A compromised server’s tool implementation reads its own auth secret to call home. |
| Egress exfiltration | A retrieve→send chain ships your data to an attacker-controlled host. |
3. One concrete example — registering and pinning an MCP server
You register a third-party MCP server from the console (Settings → Firewall → MCP servers; writes need Developer+). The server’s auth secret is stored encrypted — you supply it once, the gateway injects it at dispatch, and it’s masked on every read after that. An MCP server record carries:| Field | Values |
|---|---|
auth_mode | none, bearer, oauth, basic |
status | ok, degraded, down (set by the health probe) |
credentials | encrypted at rest, never returned in plaintext |
/api/workspace/firewall/*) operation that
needs Developer+, not a relay key — register, probe, and rule-authoring
all happen on the management plane:
tool_name_glob: <server>.* to
pending_approval until you’ve seen a clean call history — every call from
that server is held for a human before it runs. Once you trust it, relax
the rule to audit or allow. From that point on the MCP gateway
evaluates every tools/call on the mcp surface before dispatch — so if
a rug-pull later adds an undeclared tool, your policy, not the server’s
manifest, decides whether it runs.
4. Skill risk-bands & quarantine
Every installable capability — whether you registered it or the gateway auto-detected it at runtime — is run through the skill scanner. Findings roll up to a risk band and an enforcement mode:Risk bands
Risk bands
low · medium · high · critical. The band is derived from
deterministic scanner passes over the manifest and declared scopes
(undeclared tool use, network egress outside approved scopes, unsafe
filesystem writes, injection-shaped manifest text).Enforcement modes
Enforcement modes
allow (your policy rules decide), quarantine (any non-deny verdict
escalates to pending_approval — a human approves each call), block
(force deny on all of this skill’s tools regardless of rules).
A high-band skill quarantines automatically; critical blocks.Why auto-detected = always quarantined
Why auto-detected = always quarantined
A capability an agent self-installs, or a tool a rug-pull adds, is held
in
pending_approval regardless of its scan score until a human
reviews it. An operator can’t quietly add a tool and have your agents
start using it.5. Egress allow-lists — contain the “call home”
The most damaging supply-chain outcome is a compromised dependency that exfiltrates. The Firewall’segress surface evaluates the outbound
destination (host / IP / CIDR) a tool reports, so you can pin where data is
allowed to go.
You author an egress rule yourself: a host/CIDR allow-list with a
cidr_match predicate denies everything off-list. Combine it with a
sequence rule that breaks the retrieve→egress chain, and a poisoned
tool that tries to ship a retrieved document to an unknown host is denied
at the gateway.
6. Encrypted credentials — a compromised server can’t read your keys
Server auth secrets are encrypted at rest and injected by the gateway at dispatch time. They never reach the model, the agent, or the tool-call arguments — so a compromised or malicious server can’t exfiltrate your API keys by reading its own credential blob. The console always returns the secret masked — even to an Admin. The decrypted value is handed out on exactly one path: a request bearing a firewall-gateway-scoped token (a dedicated token type an Admin explicitly mints for the gateway/proxy), so an ordinary leaked relay key can’t enumerate your MCP credentials.7. Rolling it up for an audit
Supply-chain governance is also an audit artifact. OrcaRouter maps to the OWASP Top 10 for LLM Applications — including the LLM05 Supply Chain control — as part of the compliance engine, alongside frameworks likesoc2, iso_27001, iso_42001, nist_ai_rmf, and the eu_ai_act.
Installing a compliance pack
(POST /api/compliance/packs/:key/install, workspace Admin, paid plan)
materializes the matching guardrails and firewall policies and starts in an
observe-first posture. Compliance reports include an AI-supply-chain
evidence section — the upstream providers your workspace actually routed
to, plus a privileged-access and key-hygiene review — and are Ed25519-signed
and publicly verifiable. Browsing the catalog and readiness is free to every
Member; see Compliance for the full
lifecycle.
MCP governance is two complementary layers: per-call firewall evaluation on
the
mcp surface (enforcement on what a dependency does), plus a
tool-schema integrity baseline (trust-on-first-use hash of the advertised
tool set, re-checked on every probe — drift flips the server’s
schema_status to changed and fails dispatch closed until an admin
re-baselines or quarantines it). Together with skill risk-bands and
quarantine, that’s enforcement on both what a dependency does and a
verifiable record of what it declared.8. A supply-chain baseline
Before you trust a new MCP server or skill
Before you trust a new MCP server or skill
Register it, probe its tool set, and scope a
<server>.* rule to
pending_approval or audit. Read the scan findings — any
undeclared-tool or external-egress finding is a reason to keep it
quarantined. Verify who controls the endpoint URL.In steady state
In steady state
Keep an egress allow-list pinned for any agent with fetch/search/export
tools. Watch the Discovered tools view for
capabilities that appeared without a rule, and the anomaly feed for
novel tool-to-tool paths.
After a suspected rug-pull
After a suspected rug-pull
Disable the server (
PUT .../mcp_servers, "enabled": false) — its
credentials are never decrypted while disabled. Re-probe to surface new
tools, rescan the skill, and review the pending_approval queue rather
than bulk-approving.9. Related threats & concepts
- MCP tool poisoning & rug-pulls — the deep dive on malicious and hijacked MCP servers.
- Data exfiltration — egress rules that restrict where tool calls may send data.
- Dangerous tool calls — blocking destructive actions regardless of where the tool came from.
- Secret leakage — keeping credentials out of prompts, arguments, and logs.
- Securing AI agents and the control stack — how these controls fit the broader posture.
Firewall: MCP Servers
Register MCP servers behind the gateway, probe their tools, and apply a
per-call verdict before any call reaches the real server.
Firewall: Skills
Scan and risk-score every installable capability. Quarantine or block
risky skills before their tools run.
