After an agent does something consequential, you will eventually need to answer two very different questions. One is about access: who did what, and who allowed it. The other is about behavior: what did the model see, and why did it produce what it did.
These are not the same question. They have their answers in different places, and a system that tries to answer both from one log answers neither well. So we keep two independent records.
Two records, written at two moments
The first record is the audit log. Every authorization event, a role granted or revoked, a share, a denial, the lifecycle of an API key, is written to an append-only log at the moment the permission decision is made, synchronously, before the action proceeds. It records the principal, the action, the resource, and a timestamp. It is the answer to who did what, and who allowed it.
The second record is the trace store. Every model call, from any agent or sub-agent, is recorded after the call completes: the prompt, the response, the model and its version, the tokens, the latency. It is the answer to what the model saw and what it produced.
The trace covers swarms too. Every model call from a sub-agent is recorded and attributed to both the sub-process and the parent session, so when an agent spawned helpers to parallelize work, you can still see which sub-process saw what and which model produced each piece. A swarm is as auditable as a single agent, because every call in it left a trace.
Why two stores, not two tables
The audit log and the trace store are not just two tables; they are two kinds of database, because the data has two shapes. The audit log is small, append-only, transactional, and queried by principal and time, which is what a relational store is good at. The trace store is enormous and write-heavy, full of prompts and responses, and queried analytically across millions of calls, which is what a columnar analytics store is good at.
The same reasoning that splits the durable runtime into a workflow engine and a streaming log splits the record into a transactional log and an analytics store. Match the store to the shape of the data, and each kind of query stays cheap. Forced into one store, either the access queries or the trace queries would be the slow ones.
Independent on purpose
The two do not tail each other, and that is deliberate. They are different stores, written at different times, for different questions. The audit log is synchronous and append-only because an access decision has to be recorded before the action it gates, and must never be quietly edited afterward. The trace store is written asynchronously after the call, because a model call's record is large and does not gate anything.
Append-only is what makes the audit log trustworthy. A record you can edit after the fact is a record you cannot rely on in a dispute. Because the log is written before the action it gates and never mutated, what it says about an access decision is what was actually decided, not what someone wrote down later.
Keeping them independent means a problem in one does not blind you in the other. If the trace pipeline hiccups, the authorization record still stands, and the reverse holds too. Two witnesses are better than one, especially when neither depends on the other to keep telling the truth.
What each answers
The split maps cleanly onto the questions a review actually asks. What did the agent do? The audit log. Why did it do it? The trace, with the prompt, the retrieved context, and the model. What data did it consider? The trace, plus the file-access events in the audit log.
Who authorized the action? The audit log, with the severity, the approval, and the approver. Which model produced the output? The trace, with the model ID, version, and routing. No single record answers all of these, and forcing one to would mean it answers each of them only partially.
Together, a chain of custody
Put the two records beside each other and you can reconstruct any permissioned action end to end. For a given action, the combined record shows the principal that took it, the access path that allowed it, the resource it touched, the policies that applied, the approval context if there was one, the credential boundary it crossed, and the final effect.
That is a chain of custody, and it is what a serious audit is really after: not just that something happened, but the entire authorized path from who, through why and what-the-model-saw, to the result. An agent action stops being a black box you have to trust and becomes a sequence you can read.
A worked example
Say an agent sent an external email, and weeks later a reviewer asks two things. Was it authorized? The audit log answers: the send was a high-severity action, and either it was within the user's authorized ceiling or it ran on a specific intent, approved by a named person at a recorded time.
What did the model base it on? The trace answers: the prompt, the retrieved customer history, and the model and version that wrote the message. Lay the two side by side and you have the whole story, who allowed the send and what the agent was looking at when it wrote it, without either record having to be the other, and without taking the agent's own word for any of it.
It goes to your tools, and it stays in your boundary
Two things make this usable rather than merely thorough. First, none of it is a walled garden. The records export to the observability and security tools a company already runs: distributed traces through OpenTelemetry, structured logs into any SIEM, metrics into Prometheus. You watch agent activity with the same dashboards and alerts you point at everything else.
Second, all of it stays inside the deployment. The trace data does not leave the customer's boundary, and nothing is sent back to us for training or analytics. Retention is configurable per workspace and classification, and legal hold can freeze specific records as immutable when something must be preserved. Thorough is only useful if it is also yours.
Retention is set independently for each store, so you keep model-call detail exactly as long as policy says and no longer, while the authorization record can follow its own schedule. An in-cluster dashboard stack is available if a team does not already have one, but the design assumes you do, and meets the tools where they are.
Who did what, and what the model saw
An agent that acts on your systems has to be auditable, and auditable means more than a log file. It means answering both the access question and the behavior question, separately and completely, after the fact, with records that do not depend on each other to stay honest. Two independent records, written where the events happen, exported to the tools you already use, kept inside your boundary. Who did what, and what the model saw, both on the record, and together enough to reconstruct anything.