shell.exec and an external network scope is exactly the kind of thing
that should be reviewed before it runs, not discovered in an incident.
The Firewall’s Skills governance is that review. Every installable
capability is registered as a workspace-scoped record, scanned by a
deterministic risk engine, assigned a risk band and an enforcement mode,
and — at runtime — that mode rides on top of the firewall’s
rule verdicts.
1. What a “skill” is here
A skill record is one installable agent capability. A single model generalizes three kinds so one scanning, scoring, and approval plane governs everything an agent self-installs:| Kind | What it is |
|---|---|
skill | A packaged capability — a manifest plus a set of tools and a system-prompt fragment. |
mcp_server | A bring-your-own MCP server registered as a governed artifact. |
plugin | A plugin-style extension. |
builtin, registry, private,
byo_mcp, or auto_detected — that feeds the trust assessment.
2. The scanner
On registration (and on demand), the scanner runs a set of deterministic, dependency-free passes over the manifest and the declared scopes. Each pass emits findings with a severity ofinfo, warn, or error:
| Pass | Flags | Severity |
|---|---|---|
| prompt_injection | Manifest text that tries to override instructions (ignore previous instructions, you are now, a leading system:…). | warn |
| tool_creep | Tool names the manifest uses but didn’t declare in allowed_tools. | error |
| network_egress | HTTP(S) hosts in the manifest that aren’t approved in the skill’s network scopes. | warn |
| fs_write_unsafe | A write-mode filesystem scope on a path outside /tmp (traversal-safe). | error |
| data_scope | Sensitive data scopes (pii, financial, customer). | info |
| unsigned | A registry skill with no signature. | warn |
error → blocked;
otherwise any warn → flagged; otherwise clean.
3. Risk score & bands
The same findings feed a deterministic risk score (0–100, additive with per-category caps). The heaviest contributors are dangerous capabilities:| Capability | Weight |
|---|---|
| Shell execution | +30 |
| Arbitrary code eval | +30 |
Filesystem write outside /tmp | +25 |
| Secrets read | +25 |
| External network egress | +20 |
| Band | Score |
|---|---|
low | 0–25 |
medium | 26–50 |
high | 51–75 |
critical | 76–100 |
4. Enforcement mode
The band and verdict together derive an enforcement mode — what the firewall actually does when a tool owned by this skill is called:| Mode | Effect at runtime |
|---|---|
allow | The skill imposes nothing of its own; rule verdicts decide. |
quarantine | Escalate anything short of a deny to pending_approval — the skill’s tools run only after a human approves. |
block | Force a deny on the skill’s tools. |
error finding that
makes the verdict blocked will quarantine-or-block even when the numeric
band is low — the cautious direction. An operator can set the mode
explicitly; on a re-scan the mode only ever ratchets tighter, never
relaxing a block or quarantine you set.
5. Trust signals
Two signals beyond the static scan inform how a skill is treated:- Signed publishers. A skill carrying a signature from a trusted publisher is treated as more trustworthy (the signing mitigation lowers its risk score); an unsigned registry skill is penalized. You manage which publishers your workspace trusts.
- Resource reputation. A skill’s standing can be adjusted by its live behavior over time — denials and anomalies raise its risk, clean streaks lower it — so an artifact that misbehaves in production drifts toward quarantine even if its manifest scanned clean.
6. Auto-detected capabilities
The scanner doesn’t only run when you register something by hand. When an agent self-installs a capability and its tools first cross the gateway, the Firewall auto-detects it (off the hot path, asynchronously), synthesizes a manifest from what it observed, and runs the same scan, score, and mode derivation — withsource = auto_detected.
Auto-detected capabilities are quarantined until reviewed. Anything
auto-detected that would otherwise resolve to
allow is floored to
quarantine (and critical stays block) until a human reviews it. A
capability nobody approved doesn’t get a free pass just because it scanned
benign — it runs only after you’ve looked at it.7. Runtime enforcement
When a tool call reaches the firewall engine, it’s attributed to an owning skill, then the skill’s mode is applied on top of the rule verdict:- Attribution. The call is matched to a skill by its declared
allowed_tools, then bymcp_servernamespace prefix, then by a workspace-wide most-restrictive enforcing fallback. - Rule verdict. The policy’s rules run as usual — and a rule’s
skill_name_globlets you scope a rule to specific skills. - Mode override. A
blockskill forces a deny; aquarantineskill escalates anything short of deny topending_approval;allowleaves the verdict untouched.
Skill attribution fails closed. If a tool can’t be attributed (a DB
error with no cache, or an undeclared tool under a curated source), the
call is held for review rather than allowed. And skill mode is
independent of shadow mode — a quarantined or blocked skill is still
enforced even while a policy is in shadow rollout.
8. Lifecycle
- Register —
POST /skillsvalidates and scans synchronously, returning the skill plus its findings and verdict. The mode is derived (or your explicit mode is honored). - Update — re-scans the new manifest; the mode ratchets tighter on a worsened scan but never relaxes your stored block/quarantine.
- Rescan —
POST /skills/:id/rescanre-runs the scan; if the verdict newly degrades to flagged or blocked it emits a firewall event so the drift shows up in your feed. - Delete — soft-deletes and frees the name slot for re-registration.
API reference
Workspace-scoped; list reads are open to any member (and redact secret-bearing fields), everything else requires Developer+.| Method & path | Role | Purpose |
|---|---|---|
GET /api/workspace/firewall/skills | Member | List skills (redacted; filter by ?kind= and ?source=). |
GET /api/workspace/firewall/skills/:id | Developer+ | Full skill record. |
POST /api/workspace/firewall/skills | Developer+ | Register + scan (409 on duplicate name). |
PUT /api/workspace/firewall/skills/:id | Developer+ | Update + re-scan. |
POST /api/workspace/firewall/skills/:id/rescan | Developer+ | Re-scan; emits an event on degradation. |
DELETE /api/workspace/firewall/skills/:id | Developer+ | Soft-delete. |
Names are unique per workspace across kinds — a
skill named github
and an mcp_server named github collide in the same workspace. Pick
distinct names per artifact.FAQ
How is this different from the rule DSL?
How is this different from the rule DSL?
Rules gate tool calls by name and
arguments. Skills gate the capabilities an agent loads — the package,
its manifest, and its requested permissions — before any of its tools
run. The skill’s mode then rides on top of whatever the rules decide,
so the two compose: a rule can
allow http.fetch in general while a
quarantined skill that owns it still gets held.What stops a malicious skill from declaring a clean manifest?
What stops a malicious skill from declaring a clean manifest?
Several things. Tool-creep detection flags tools used but not
declared; auto-detection re-scans from what actually crossed the
gateway, not just the claimed manifest; the mode ratchets tighter (not
looser) on re-scan; resource reputation drifts a misbehaving artifact
toward quarantine over time; and attribution fails closed when a tool
can’t be tied to a declared skill.
Do I have to register every skill manually?
Do I have to register every skill manually?
No. Register the ones you want to pre-approve; the rest are
auto-detected on first use and quarantined until you review them. Turn
on observe mode to surface everything an agent installs without
blocking, then tighten from real data.
See also
Going deeper on agent security? The Secure Your Agents (Zero Trust) guides put this feature in a zero-trust workflow.Secure Agents baseline
Apply a zero-trust posture to every agent capability in one switch.
Agentic guardrails
Guardrails built for autonomous, tool-using agents.
