Everything here is workspace-scoped. Members see their workspace’s trail;
nothing crosses tenant boundaries. The trail is produced by features you
already configure — Guardrails and the
Firewall — so turning on enforcement turns on
forensics at the same time.
1. The four records behind an ai agent audit trail
Attribution comes from four independent streams, each correlated to the same run and session so you can pivot between them:Guardrail Matches
Every content rule that fired on a request or response — rule type,
action, stage, and a detail string. Member-readable.
Firewall Events & Runs
Every tool-call verdict —
allow, audit, deny, sanitize,
pending_approval (hold-for-approval), and the resolved verdict of a
cap_cost rule — rolled up by agent run and session. Developer+.Approval decisions
Who approved or rejected each held tool call, recorded as an audit
action.
Policy-change history
Every guardrail and firewall edit — versioned, diffable, revertible —
plus a workspace audit row per change.
2. Guardrail Matches — what was screened (Member)
Every time a guardrail rule fires, the gateway writes a match. The feed lives on the Guardrails page (Matches tab) and is readable by any workspace member. Each match records the rule type, the action taken (block /
mask / flag / annotate / spotlight), the stage (input /
output), a detail string,
and the run lineage of the request that triggered it. List it, group
it by guardrail or rule type, filter by action, drill into one match, or
export the feed to CSV.
A noisy rule is part of the trail too. Mark a match as a false positive
with POST /api/guardrail/match/:id/mark-fp (Admin) so your signal stays
clean and your reports don’t over-count.
3. Firewall Events & Runs — what the agent did (Developer+)
Where Matches cover text, firewall Events cover actions. Every tool-call evaluation is logged with its verdict, surface, tool name, and — critically — the agent run and session it belongs to. Reads on Events, the Runs/sessions rollup, and the per-run trace require Developer+; the lighter Discovered-tools and anomaly feeds are open to every Member. The Runs & sessions view is the forensic workhorse: it rolls events up by agent run into a verdict breakdown, the distinct tools and models the run touched, and first/last-seen timestamps — the “what did this agent actually do” answer in one screen. Beyond static verdicts, the anomaly feed flags deviations from each workspace’s learned hour-of-week baseline (a 14-day rolling average) — rate and cost spikes,retry_loop, and novel_path transitions — so an
allowed-but-abnormal pattern still surfaces in the record.
4. Approval decisions — who said yes (audit action)
When a rule resolves topending_approval, the held call becomes an
out-of-band review (see the Firewall’s HITL flow).
The decision is part of the trail: approving or rejecting writes a
workspace audit row — firewall_approval_approve or
firewall_approval_reject — naming the actor. Decisions are
first-writer-wins and idempotent, and if the underlying rule changed after
the hold, the enrichment notes that context shifted.
So a held-then-approved tool call is fully attributable end to end: the
firewall event shows the hold, the audit row shows who released it, and
both correlate to the same run.
5. Policy-change audit — who changed the rules
A trail of agent behavior is only trustworthy if you can also prove what the policy was at the time — and who changed it. Guardrails keep a full version history. Every create, update, and delete writes a versioned history row in the same transaction as the change. Open History on a guardrail to see every version with author and timestamp, diff any two, and revert to an older one (revert is recorded as a new version — history is never mutated). Firewall policy, rule, and settings changes each write a workspace audit row after the change commits —firewall_policy_update,
firewall_rule_create, firewall_settings_update, and so on — and
autonomy-level changes (firewall_autonomy_applied /
firewall_autonomy_undone) capture the before-state snapshot that powers
one-click undo. Secrets and rule blobs are never logged.
6. A worked example: trace one suspect run
Suppose a run is flagged for an unexpected outbound call. From the console, with a Developer+ session:Open the run in Firewall → Runs
Find the run by its id. The rollup shows every tool it called and the
verdict on each — including the
deny on the fetch-shaped tool that
flagged it.Pivot to the events
Drill into the denied event. It carries the tool name, the matched
rule and reason, the surface, and the run/session lineage — the same
lineage you’ll use to line up the guardrail side.
Check what was screened on the same run
Open Guardrails → Matches and filter to that run. If a Secrets
Blocker or PII rule fired on the prompt, you now know the agent was
handed sensitive material before it tried to exfiltrate it.
7. Signed compliance reports — a trail an auditor can verify
For external proof, the Compliance surface turns this trail into a single artifact. Browsing the framework catalog, packs, and readiness is open to every Member and free; installing a pack, generating a report, going live, and setting data residency are workspace Admin actions on a paid plan (server-gated). A compliance report is Ed25519-signed with a SHA256 content hash and is publicly verifiable — the recipient checks it without an OrcaRouter account:| Endpoint | Purpose |
|---|---|
GET /api/public/compliance/pubkey | The public key to verify against. |
POST /api/public/compliance/verify | Verify a report’s signature + hash. |
GET /api/public/compliance/share/:token | An auditor-share link to a report. |
soc2,
hipaa, gdpr, iso_27001, iso_42001, nist_ai_rmf, pci_dss, the
EU AI Act (eu_ai_act), and the OWASP Top 10 for LLM Applications
(owasp_llm), among others — installing a pack materializes the matching
guardrails and firewall policies so the controls you report on are the
controls actually enforced.
Data residency here is the report artifact’s region (
us / eu /
uk / ap / cn / global), settable via PUT /api/compliance/residency
(Admin); cross-region reads are withheld. It governs where the evidence
artifact lives — it is not geo-pinning of your inference traffic.8. Retention and the right to erasure
A forensic record is bounded, not forever. Request logs default to 30 days of retention and are server-clamped to a hard maximum of 180 days. When a user self-deletes, a 30-day grace window applies, after which their PII is scrubbed and the cascade purges their guardrail matches, request logs, and firewall events — satisfying right-to-erasure / DSAR obligations while keeping aggregate audit history intact.9. Where to go next
Guardrails reference
Matches, raw-content logging, version history, and the full rule set.
Firewall reference
Events, Runs, anomalies, approvals, and the audit log.
Excessive agency
Constrain what an agent is allowed to do before it acts.
Enforcement modes
Audit, shadow, and observe — how to build a trail before you enforce.
