<?xml version="1.0" encoding="utf-8"?>
<!-- name="GENERATOR" content="github.com/mmarkdown/mmark Mmark Markdown Processor - mmark.miek.nl" -->
<rfc version="3" ipr="trust200902" docName="draft-xkumakichi-xaip-receipts-00" submissionType="independent" category="info" xml:lang="en" xmlns:xi="http://www.w3.org/2001/XInclude" indexInclude="true">

<front>
<title abbrev="XAIP Receipts">Signed Execution Receipts for AI Agent Tool Calls (XAIP Receipts)</title><seriesInfo value="draft-xkumakichi-xaip-receipts-00" stream="independent" status="informational" name="Internet-Draft"></seriesInfo>
<author fullname="xkumakichi"><organization></organization><address><postal><street></street>
</postal><email>kuma.github@gmail.com</email>
</address></author><date year="2026" month="May" day="22"></date>
<area>Applications and Real-Time</area>
<workgroup>Independent Submission</workgroup>
<keyword>AI agents</keyword>
<keyword>tool calls</keyword>
<keyword>signed receipts</keyword>
<keyword>trust</keyword>
<keyword>DID</keyword>

<abstract>
<t>This document defines a wire format for signed execution receipts produced by AI agents when they invoke tools, services, or other agents. A receipt records the minimum facts needed to make a trust decision about a future call: who acted, who delegated, what tool was used, whether the call succeeded, how long it took, and how the call's inputs and outputs are identified (without disclosing their contents).</t>
<t>The format is intentionally tool-system-agnostic. The same receipt structure can be emitted by MCP (Model Context Protocol) servers, LangChain.js callback handlers, OpenAI tool-calling loops, HTTP clients, or proprietary agent runtimes. Receipts use Ed25519 signatures over a JSON-canonicalized payload, and identities are W3C Decentralized Identifiers (DIDs).</t>
<t>Scoring policy, aggregation architecture, and reactive behavior in response to receipts are explicitly out of scope and left to deployments.</t>
</abstract>

</front>

<middle>

<section anchor="introduction"><name>Introduction</name>

<section anchor="motivation"><name>Motivation</name>
<t>AI agents increasingly act on behalf of users: they pick tools, call APIs, delegate to other agents, and -- in some deployments -- participate in transaction workflows. Each of those actions is preceded by an implicit trust decision: which tool should I use, and is it likely to do what I expect?</t>
<t>Today, that decision is mostly answered by upstream proxies -- whether the tool's name appears in a model's training data, whether a registry surfaces it, whether a platform recommends it. None of these proxies record what the tool actually did in real calls. There is no widely-deployed, interoperable record format that an agent (or an agent-payment protocol, or an audit system) can use to look back and answer &quot;what happened the last N times this tool was called?&quot;</t>
<t>This document defines such a format. It is intentionally narrow: it covers the wire format for one receipt. How receipts are stored, aggregated, queried, scored, or reacted to is a deployment-policy concern and is out of scope.</t>
</section>

<section anchor="design-principles"><name>Design Principles</name>

<ul spacing="compact">
<li>Wire format only. Scoring models, aggregation topologies, and decision logic are deployment choices, not protocol requirements.</li>
<li>Tool-system-agnostic. The same receipt can be produced by MCP, LangChain, OpenAI tool calling, plain HTTP, or proprietary runtimes.</li>
<li>Privacy-preserving by construction. Receipts identify inputs and outputs by hash, not by content. A receipt does not require disclosure of user data, prompts, or tool outputs.</li>
<li>Independently verifiable. Anyone holding the receipt and the public keys can verify the signatures without consulting any registry or trusted third party.</li>
<li>Co-signed where possible. Both the Executor (Agent) and the Caller sign the same canonical payload, so neither can unilaterally fabricate the record.</li>
</ul>
</section>

<section anchor="out-of-scope"><name>Out of Scope</name>
<t>This document does NOT define:</t>

<ul spacing="compact">
<li>A scoring model. Trust scores derived from receipts are deployment policy.</li>
<li>An aggregation architecture. Receipts can be stored locally, federated, anchored, or relayed in any pattern.</li>
<li>A query API. Consumers may serve receipts and/or derived data over any protocol they choose.</li>
<li>Identity priors. If a deployment chooses to weight different DID methods differently, that is deployment policy.</li>
<li>A specific transport. Receipts may be exchanged over HTTP, MCP, message queues, or any other carrier.</li>
</ul>
</section>

<section anchor="conventions-and-definitions"><name>Conventions and Definitions</name>
<t>The key words &quot;MUST&quot;, &quot;MUST NOT&quot;, &quot;REQUIRED&quot;, &quot;SHALL&quot;, &quot;SHALL NOT&quot;, &quot;SHOULD&quot;, &quot;SHOULD NOT&quot;, &quot;RECOMMENDED&quot;, &quot;NOT RECOMMENDED&quot;, &quot;MAY&quot;, and &quot;OPTIONAL&quot; in this document are to be interpreted as described in BCP 14 <xref target="RFC2119"></xref> <xref target="RFC8174"></xref> when, and only when, they appear in all capitals, as shown here.</t>

<section anchor="terminology"><name>Terminology</name>

<dl spacing="compact">
<dt>Agent:</dt>
<dd>An automated system, typically an AI agent, that invokes tools, services, or other agents on a principal's behalf.</dd>
<dt>Caller:</dt>
<dd>The party that delegated the tool call to the Agent. Often (but not always) the same legal entity as the Agent's principal.</dd>
<dt>Tool:</dt>
<dd>A named operation invoked by the Agent. The tool implementation may be local code, an MCP server, an HTTP API, a sub-agent, or any callable target.</dd>
<dt>Receipt:</dt>
<dd>A signed record of a single Tool execution attempt.</dd>
<dt>Executor signature:</dt>
<dd>The signature produced by the Agent that ran the tool.</dd>
<dt>Caller signature:</dt>
<dd>The signature produced by the Caller over the same canonicalized payload as the Executor signature.</dd>
<dt>DID:</dt>
<dd>Decentralized Identifier, as defined in the W3C DID Core specification <xref target="DID-CORE"></xref>.</dd>
</dl>
</section>
</section>
</section>

<section anchor="receipt-structure"><name>Receipt Structure</name>
<t>A receipt is a JSON object with the following fields:</t>
<table>
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Required</th>
<th>Description</th>
</tr>
</thead>

<tbody>
<tr>
<td><tt>agentDid</tt></td>
<td>string (DID)</td>
<td>yes</td>
<td>The Agent that executed the tool.</td>
</tr>

<tr>
<td><tt>callerDid</tt></td>
<td>string (DID)</td>
<td>yes</td>
<td>The Caller that delegated the tool call. MAY equal <tt>agentDid</tt> when there is no delegation.</td>
</tr>

<tr>
<td><tt>toolName</tt></td>
<td>string</td>
<td>yes</td>
<td>A stable identifier for the tool. Format is opaque to this spec.</td>
</tr>

<tr>
<td><tt>taskHash</tt></td>
<td>string (hex, lowercase)</td>
<td>yes</td>
<td>A hash of the canonical task input. SHA-256 RECOMMENDED.</td>
</tr>

<tr>
<td><tt>resultHash</tt></td>
<td>string (hex, lowercase)</td>
<td>yes</td>
<td>A hash of the canonical task output. SHA-256 RECOMMENDED. For failures, the hash MAY be of a canonical failure description.</td>
</tr>

<tr>
<td><tt>success</tt></td>
<td>boolean</td>
<td>yes</td>
<td><tt>true</tt> if the tool call satisfied the agent's success criterion, <tt>false</tt> otherwise.</td>
</tr>

<tr>
<td><tt>latencyMs</tt></td>
<td>integer &gt;= 0</td>
<td>yes</td>
<td>Wall-clock time from invocation to completion, in milliseconds.</td>
</tr>

<tr>
<td><tt>failureType</tt></td>
<td>string</td>
<td>yes</td>
<td>One of the values defined in <xref target="failure-type-classification"></xref> when <tt>success</tt> is <tt>false</tt>. When <tt>success</tt> is <tt>true</tt>, the value MUST be the empty string.</td>
</tr>

<tr>
<td><tt>timestamp</tt></td>
<td>string (RFC 3339)</td>
<td>yes</td>
<td>UTC timestamp of completion.</td>
</tr>

<tr>
<td><tt>signature</tt></td>
<td>string (hex)</td>
<td>yes</td>
<td>Ed25519 signature by the Agent over the canonical payload.</td>
</tr>

<tr>
<td><tt>callerSignature</tt></td>
<td>string (hex)</td>
<td>recommended</td>
<td>Ed25519 signature by the Caller over the same canonical payload.</td>
</tr>

<tr>
<td><tt>toolMetadata</tt></td>
<td>object</td>
<td>optional</td>
<td>Tool-class or capability hints. Format is deployment-defined.</td>
</tr>
</tbody>
</table>
<section anchor="example"><name>Example</name>

<sourcecode type="json"><![CDATA[{
  "agentDid": "did:web:myagent.example",
  "callerDid": "did:key:z6Mk...",
  "toolName": "translate",
  "taskHash":
"9b74c9897bac770ffc029102a200c5de5f2a1b3c4d5e6f708192a3b4c5d6e7f8",
  "resultHash":
"f0e1d2c3b4a5987612345678abcdef00112233445566778899aabbccddeeff00",
  "success": true,
  "latencyMs": 142,
  "failureType": "",
  "timestamp": "2026-05-14T10:30:00.000Z",
  "signature": "...",
  "callerSignature": "..."
}
]]></sourcecode>
</section>
</section>

<section anchor="canonical-payload-and-signing"><name>Canonical Payload and Signing</name>

<section anchor="canonical-payload"><name>Canonical Payload</name>
<t>The signed payload is the JSON object containing exactly the following fields, in this order after lexicographic sorting per <xref target="RFC8785"></xref>:</t>

<artwork><![CDATA[agentDid, callerDid, failureType, latencyMs, resultHash,
success, taskHash, timestamp, toolName
]]></artwork>
<t>The <tt>signature</tt>, <tt>callerSignature</tt>, and <tt>toolMetadata</tt> fields are excluded from the canonical payload. Implementations producing receipts MUST canonicalize using JCS as defined in <xref target="RFC8785"></xref>.</t>
</section>

<section anchor="signing-algorithm"><name>Signing Algorithm</name>
<t>Signatures are computed using Ed25519, as defined in <xref target="RFC8032"></xref>. The signature input is the UTF-8 encoding of the canonical JSON string produced in the previous subsection.</t>
<t>The <tt>signature</tt> field is the Executor's Ed25519 signature, encoded as a lowercase hexadecimal string. The <tt>callerSignature</tt> field, when present, is the Caller's Ed25519 signature over the same canonical input.</t>
</section>

<section anchor="verification"><name>Verification</name>
<t>A verifier MUST:</t>

<ol spacing="compact">
<li>Recompute the canonical payload from the receipt's fields.</li>
<li>Resolve <tt>agentDid</tt> to its current public key per <xref target="DID-CORE"></xref>.</li>
<li>Verify <tt>signature</tt> against the canonical payload using the Agent's public key.</li>
<li>If <tt>callerSignature</tt> is present, resolve <tt>callerDid</tt> similarly and verify <tt>callerSignature</tt> against the same canonical payload.</li>
<li>Reject the receipt if any signature verification fails.</li>
</ol>
<t>A verifier MAY additionally validate that <tt>timestamp</tt> is within a deployment-defined freshness window.</t>
</section>
</section>

<section anchor="signingdelegate-pattern-caller-co-signature"><name>SigningDelegate Pattern (Caller Co-signature)</name>
<t>To produce a co-signed receipt, a Caller MUST NOT transmit private key material to the Executor. Instead, the Caller exposes a SigningDelegate interface:</t>

<artwork><![CDATA[interface SigningDelegate {
  did: DIDString
  sign(payload: string): Promise<HexString>
}
]]></artwork>
<t>The Executor sends the canonical payload string to the Caller's <tt>sign</tt> method and receives the signature. The private key never leaves the Caller's process boundary.</t>
<t>When the Caller and Executor are not co-located, the transport carrying canonical payloads to the Caller MUST use TLS or an equivalent confidentiality and integrity layer.</t>
<t>A Caller MAY decline to sign -- for example, if the Caller does not consent to the receipt's contents. In that case the Executor publishes the receipt with only its own <tt>signature</tt> and no <tt>callerSignature</tt>. Such receipts remain syntactically valid; consumers may weight them differently as a matter of deployment policy.</t>
</section>

<section anchor="failure-type-classification"><name>Failure Type Classification</name>
<t>When <tt>success</tt> is <tt>false</tt>, <tt>failureType</tt> MUST be one of:</t>
<table>
<thead>
<tr>
<th>Value</th>
<th>Condition</th>
</tr>
</thead>

<tbody>
<tr>
<td><tt>timeout</tt></td>
<td>The call exceeded a deployment-defined latency bound (default RECOMMENDED: 30000 ms), or the underlying error was timeout-shaped.</td>
</tr>

<tr>
<td><tt>validation</tt></td>
<td>The call failed due to input or output validation (schema, parse, type mismatch).</td>
</tr>

<tr>
<td><tt>error</tt></td>
<td>All other failures.</td>
</tr>
</tbody>
</table><t><tt>failureType</tt> MAY be extended by deployments with additional values. Receiving implementations MUST treat unknown <tt>failureType</tt> values as <tt>error</tt> for the purposes of any deployment-policy decision they make.</t>
<t>When <tt>success</tt> is <tt>true</tt>, <tt>failureType</tt> MUST be the empty string. This is a deliberate choice over a null value: it keeps the canonical payload's value type stable (always string) so that JCS canonicalization produces a predictable byte sequence regardless of success state. A verifier that substitutes a null value for an empty <tt>failureType</tt> will compute a different canonical payload and will fail to verify legitimate receipts.</t>
</section>

<section anchor="tool-metadata-optional"><name>Tool Metadata (Optional)</name>
<t>A receipt MAY carry a <tt>toolMetadata</tt> object describing class or capability hints about the tool. This document does not standardize the schema of <tt>toolMetadata</tt>. A deployment may use it to convey:</t>

<ul spacing="compact">
<li>A tool class (e.g., advisory, data-retrieval, mutation, settlement).</li>
<li>A settlement layer identifier when the tool executes on-chain transactions.</li>
<li>A verifiability hint indicating whether the tool's outcome is externally anchored.</li>
</ul>
<t><tt>toolMetadata</tt> is NOT part of the canonical payload and is NOT signed. Consumers that wish to trust <tt>toolMetadata</tt> MUST validate it through out-of-band means (e.g., the tool's published manifest, signed separately).</t>
<t>A future revision of this document, or a companion document, MAY standardize a portion of the <tt>toolMetadata</tt> schema if interoperability needs emerge.</t>
</section>

<section anchor="identity-did-requirements"><name>Identity (DID) Requirements</name>
<t>Both <tt>agentDid</tt> and <tt>callerDid</tt> MUST be syntactically valid DIDs per <xref target="DID-CORE"></xref>. This document does not constrain the DID method. Common choices in production include <tt>did:key</tt>, <tt>did:web</tt>, and ledger-anchored methods such as <tt>did:xrpl</tt> or <tt>did:ethr</tt>.</t>
<t>A deployment MAY apply policy based on DID method -- for example, treating ledger-anchored identities differently from cryptographic-only identities. Such policy is out of scope for this document; the wire format treats all DID methods uniformly.</t>
</section>

<section anchor="security-considerations"><name>Security Considerations</name>

<section anchor="privacy"><name>Privacy</name>
<t>Receipts identify inputs and outputs by hash. Implementations MUST NOT include raw inputs, outputs, prompts, user data, secrets, or PII in any signed field. <tt>toolMetadata</tt>, while not part of the canonical payload, also SHOULD NOT contain such data.</t>
<t>Hash construction matters: a deployment that hashes uncanonicalized inputs may leak information through hash collisions or correlation. Implementations SHOULD canonicalize inputs before hashing (for example, with JCS for JSON inputs).</t>
</section>

<section anchor="replay"><name>Replay</name>
<t>A signed receipt is replayable by anyone who possesses it. Receivers SHOULD enforce a freshness window on <tt>timestamp</tt> and SHOULD reject duplicate receipts identified by <tt>(signature)</tt> (which is unique given the inclusion of <tt>timestamp</tt> in the canonical payload). A deployment that needs cross-receipt deduplication MAY additionally store and dedupe by <tt>(agentDid, taskHash, timestamp)</tt>.</t>
</section>

<section anchor="caller-side-forgery"><name>Caller-Side Forgery</name>
<t>A receipt with only <tt>signature</tt> (Executor) and no <tt>callerSignature</tt> represents the Executor's claim alone. A malicious Executor could fabricate such receipts. Co-signature by the Caller prevents such receipts from being accepted as caller-attested: a Caller observing a forged receipt about its own delegations would notice the absence of its <tt>callerSignature</tt> and could repudiate.</t>
<t>When <tt>callerSignature</tt> is missing, a deployment SHOULD weight the receipt accordingly. The exact weighting is policy, but treating co-signed and non-co-signed receipts identically is a security mistake.</t>
</section>

<section anchor="single-observer-dominance"><name>Single-Observer Dominance</name>
<t>If a deployment derives reputation or trust signals from receipts and a single Caller produces most of the receipts about a given tool, that Caller's environment-specific bugs, biases, or hostile behavior propagate directly into the derived signal. This is a deployment-policy concern, not a wire-format concern. Deployments SHOULD record the set of distinct <tt>callerDid</tt> values contributing to any derived statistic so that consumers can reason about observer diversity.</t>
</section>

<section anchor="key-compromise"><name>Key Compromise</name>
<t>A compromised Agent or Caller key allows arbitrary receipt forgery for the lifetime of that key. DID methods that support key rotation SHOULD rotate routinely. Verifiers MUST resolve DIDs to the current key set at verification time, not at receipt emission time.</t>
</section>

<section anchor="timestamp-trust"><name>Timestamp Trust</name>
<t><tt>timestamp</tt> is asserted by the Executor and is not independently anchored by this format. A deployment that requires verifiable time SHOULD pair receipts with an external time-anchoring mechanism (<xref target="RFC3161"></xref>, blockchain inclusion, etc.).</t>
</section>
</section>

<section anchor="iana-considerations"><name>IANA Considerations</name>
<t>This document has no IANA actions in its current form. A future revision may register a media type (e.g., <tt>application/xaip-receipt+json</tt>) and a failureType registry.</t>
</section>

</middle>

<back>
<references><name>References</name>
<references><name>Normative References</name>
<reference anchor="DID-CORE" target="https://www.w3.org/TR/did-core/">
  <front>
    <title>Decentralized Identifiers (DIDs) v1.0</title>
    <author fullname="Manu Sporny" initials="M." surname="Sporny">
      <organization>W3C</organization>
    </author>
    <date year="2022" month="July"></date>
  </front>
  <seriesInfo name="W3C" value="Recommendation"></seriesInfo>
</reference>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8032.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8785.xml"/>
</references>
<references><name>Informative References</name>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3161.xml"/>
<reference anchor="XAIP-CASE-2026-05" target="https://github.com/xkumakichi/xaip-protocol/blob/main/docs/case-study/single-caller-dominance.md">
  <front>
    <title>XAIP Single-Caller Dominance Case Study</title>
    <author>
      <organization>xkumakichi</organization>
    </author>
    <date year="2026" month="May"></date>
  </front>
</reference>
<reference anchor="XAIP-IMPL" target="https://github.com/xkumakichi/xaip-protocol">
  <front>
    <title>XAIP Protocol Reference Implementation</title>
    <author>
      <organization>xkumakichi</organization>
    </author>
    <date year="2026"></date>
  </front>
</reference>
</references>
</references>

<section anchor="relationship-to-the-xaip-reference-implementation"><name>Relationship to the XAIP Reference Implementation</name>
<t>The XAIP reference implementation <xref target="XAIP-IMPL"></xref> wraps this wire format with an aggregator, a Bayesian trust score, optional metadata display, risk-flag logic, and a decision engine that ranks candidate tools. None of those components are required to produce or consume receipts conformant to this document. A consumer that only wants to verify and store receipts does not need to import any of them.</t>
<t>A consumer that wants a turnkey aggregator and scoring layer may use the reference implementation. A consumer that disagrees with any of those design choices is free to substitute its own implementation while remaining interoperable at the receipt-format layer.</t>
<t>The single-observer dominance failure mode discussed earlier in this document was first surfaced in the public dataset of that reference implementation <xref target="XAIP-CASE-2026-05"></xref>.</t>
</section>

<section anchor="adoption-path-for-agent-payment-protocols"><name>Adoption Path for Agent-Payment Protocols</name>
<t>This format is intended to be useful to agent-payment protocols (for example, agent-to-agent payment protocols, agent-mediated commerce protocols, and agent escrow systems) that need a &quot;trust precondition&quot; check before committing to a transaction. Such a protocol can:</t>

<ol spacing="compact">
<li>Require that an Agent present a set of recent receipts before being allowed to initiate a payment.</li>
<li>Define its own scoring policy over the receipt set, or consult an external scoring service.</li>
<li>Require that receipts above a certain transaction value include <tt>callerSignature</tt> (co-signed).</li>
<li>Require that receipts for <tt>settlement</tt>-class tools (declared via <tt>toolMetadata</tt>) be additionally anchored to an external ledger.</li>
</ol>
<t>Each of those is a policy decision local to the agent-payment protocol. This document only defines the receipt wire format; it does not define a payment mechanism, a settlement rail, or any value-transfer system.</t>
</section>

<section anchor="change-log"><name>Change Log</name>

<ul spacing="compact">
<li><tt>-00</tt> (2026-05-22): Initial individual draft. Split out from the XAIP reference implementation specification, focused on the receipt wire format only. Removed aggregator, scoring, and decision-engine content; left those to deployment policy.</li>
</ul>
</section>

</back>

</rfc>
