Lack of attribution and forensics

When something goes wrong with an agent, the first question is always the same: what did it actually do, and who changed the policy that let it? Without a trail you can’t answer either. You can’t show an auditor that a control was in force on the day in question, you can’t tell a real attack from a noisy false positive, and you can’t reconstruct the run that leaked the row. OrcaRouter records the answer as you go. Every screened prompt, every tool call, every approval, and every policy edit lands in a workspace-scoped, queryable record — correlated back to the agent run and session that produced it. This page shows how to use that record as an ai agent audit trail: from a single suspect run to a signed report you hand an auditor.

Everything here is workspace-scoped. Members see their workspace’s trail; nothing crosses tenant boundaries. The trail is produced by features you already configure — Guardrails and the Firewall — so turning on enforcement turns on forensics at the same time.

1. The four records behind an ai agent audit trail

Attribution comes from four independent streams, each correlated to the same run and session so you can pivot between them:

Guardrail Matches

Every content rule that fired on a request or response — rule type, action, stage, and a detail string. Member-readable.

Firewall Events & Runs

Every tool-call verdict — allow, audit, deny, sanitize, pending_approval (hold-for-approval), and the resolved verdict of a cap_cost rule — rolled up by agent run and session. Developer+.

Approval decisions

Who approved or rejected each held tool call, recorded as an audit action.

Policy-change history

Every guardrail and firewall edit — versioned, diffable, revertible — plus a workspace audit row per change.

The connective tissue is the agent run and session id. A guardrail match and a firewall event from the same conversation carry the same run lineage, so “this run masked an email, then tried a fetch we denied, then was approved for a write” reads as one story instead of three disconnected logs.

2. Guardrail Matches — what was screened (Member)

Every time a guardrail rule fires, the gateway writes a match. The feed lives on the Guardrails page (Matches tab) and is readable by any workspace member. Each match records the rule type, the action taken (block / mask / flag / annotate / spotlight), the stage (input / output), a detail string, and the run lineage of the request that triggered it. List it, group it by guardrail or rule type, filter by action, drill into one match, or export the feed to CSV.

The matched substring (the actual email, the SSN) is recorded only when the guardrail’s Log raw content toggle is on — and it is off by default, the privacy-conservative posture. With it off, you get that a rule fired and its detail meta-string, but not the raw value. Turn it on per guardrail when you need the substring for triage; the setting is non-retroactive.

A noisy rule is part of the trail too. Mark a match as a false positive with POST /api/guardrail/match/:id/mark-fp (Admin) so your signal stays clean and your reports don’t over-count.

3. Firewall Events & Runs — what the agent did (Developer+)

Where Matches cover text, firewall Events cover actions. Every tool-call evaluation is logged with its verdict, surface, tool name, and — critically — the agent run and session it belongs to. Reads on Events, the Runs/sessions rollup, and the per-run trace require Developer+; the lighter Discovered-tools and anomaly feeds are open to every Member. The Runs & sessions view is the forensic workhorse: it rolls events up by agent run into a verdict breakdown, the distinct tools and models the run touched, and first/last-seen timestamps — the “what did this agent actually do” answer in one screen. Beyond static verdicts, the anomaly feed flags deviations from each workspace’s learned hour-of-week baseline (a 14-day rolling average) — rate and cost spikes, retry_loop, and novel_path transitions — so an allowed-but-abnormal pattern still surfaces in the record.

4. Approval decisions — who said yes (audit action)

When a rule resolves to pending_approval, the held call becomes an out-of-band review (see the Firewall’s HITL flow). The decision is part of the trail: approving or rejecting writes a workspace audit row — firewall_approval_approve or firewall_approval_reject — naming the actor. Decisions are first-writer-wins and idempotent, and if the underlying rule changed after the hold, the enrichment notes that context shifted. So a held-then-approved tool call is fully attributable end to end: the firewall event shows the hold, the audit row shows who released it, and both correlate to the same run.

5. Policy-change audit — who changed the rules

A trail of agent behavior is only trustworthy if you can also prove what the policy was at the time — and who changed it. Guardrails keep a full version history. Every create, update, and delete writes a versioned history row in the same transaction as the change. Open History on a guardrail to see every version with author and timestamp, diff any two, and revert to an older one (revert is recorded as a new version — history is never mutated). Firewall policy, rule, and settings changes each write a workspace audit row after the change commits — firewall_policy_update, firewall_rule_create, firewall_settings_update, and so on — and autonomy-level changes (firewall_autonomy_applied / firewall_autonomy_undone) capture the before-state snapshot that powers one-click undo. Secrets and rule blobs are never logged.

Both planes log the change and keep the policy reversible. If a rule edit caused a regression, the policy-change trail tells you which edit and who made it — and you roll it back without redeploying anything.

6. A worked example: trace one suspect run

Suppose a run is flagged for an unexpected outbound call. From the console, with a Developer+ session:

Open the run in Firewall → Runs

Find the run by its id. The rollup shows every tool it called and the verdict on each — including the deny on the fetch-shaped tool that flagged it.

Pivot to the events

Drill into the denied event. It carries the tool name, the matched rule and reason, the surface, and the run/session lineage — the same lineage you’ll use to line up the guardrail side.

Check what was screened on the same run

Open Guardrails → Matches and filter to that run. If a Secrets Blocker or PII rule fired on the prompt, you now know the agent was handed sensitive material before it tried to exfiltrate it.

Confirm the policy was in force

Open History on the guardrail and the firewall policy’s audit rows. Confirm no one weakened the relevant rule before the run — and if they did, you have the author and timestamp.

One run, four correlated records, no log-grep archaeology. For the exfiltration defenses themselves, see Data exfiltration and Dangerous tool calls.

7. Signed compliance reports — a trail an auditor can verify

For external proof, the Compliance surface turns this trail into a single artifact. Browsing the framework catalog, packs, and readiness is open to every Member and free; installing a pack, generating a report, going live, and setting data residency are workspace Admin actions on a paid plan (server-gated). A compliance report is Ed25519-signed with a SHA256 content hash and is publicly verifiable — the recipient checks it without an OrcaRouter account:

Endpoint	Purpose
`GET /api/public/compliance/pubkey`	The public key to verify against.
`POST /api/public/compliance/verify`	Verify a report’s signature + hash.
`GET /api/public/compliance/share/:token`	An auditor-share link to a report.

Reports export as CSV / JSON / PDF. Frameworks include soc2, hipaa, gdpr, iso_27001, iso_42001, nist_ai_rmf, pci_dss, the EU AI Act (eu_ai_act), and the OWASP Top 10 for LLM Applications (owasp_llm), among others — installing a pack materializes the matching guardrails and firewall policies so the controls you report on are the controls actually enforced.

Data residency here is the report artifact’s region (us / eu / uk / ap / cn / global), settable via PUT /api/compliance/residency (Admin); cross-region reads are withheld. It governs where the evidence artifact lives — it is not geo-pinning of your inference traffic.

8. Retention and the right to erasure

A forensic record is bounded, not forever. Request logs default to 30 days of retention and are server-clamped to a hard maximum of 180 days. When a user self-deletes, a 30-day grace window applies, after which their PII is scrubbed and the cascade purges their guardrail matches, request logs, and firewall events — satisfying right-to-erasure / DSAR obligations while keeping aggregate audit history intact.

9. Where to go next

Guardrails reference

Matches, raw-content logging, version history, and the full rule set.

Firewall reference

Events, Runs, anomalies, approvals, and the audit log.

Excessive agency

Constrain what an agent is allowed to do before it acts.

Enforcement modes

Audit, shadow, and observe — how to build a trail before you enforce.

​1. The four records behind an ai agent audit trail

Guardrail Matches

Firewall Events & Runs

Approval decisions

Policy-change history

​2. Guardrail Matches — what was screened (Member)

​3. Firewall Events & Runs — what the agent did (Developer+)

​4. Approval decisions — who said yes (audit action)

​5. Policy-change audit — who changed the rules

​6. A worked example: trace one suspect run

​7. Signed compliance reports — a trail an auditor can verify

​8. Retention and the right to erasure

​9. Where to go next

Guardrails reference

Firewall reference

Excessive agency

Enforcement modes

1. The four records behind an ai agent audit trail

2. Guardrail Matches — what was screened (Member)

3. Firewall Events & Runs — what the agent did (Developer+)

4. Approval decisions — who said yes (audit action)

5. Policy-change audit — who changed the rules

6. A worked example: trace one suspect run

7. Signed compliance reports — a trail an auditor can verify

8. Retention and the right to erasure

9. Where to go next