Structured Data Schema Interaction Protocol for Multi-Agent Collaboration

Structured Data Schema Interaction Protocol for Multi-Agent Collaboration Huawei Technologies

Huawei Bld., No.156 Beiqing Rd. Beijing 100095 China zhoufangtong@huawei.com

Huawei Technologies

Huawei Bld., No.156 Beiqing Rd. Beijing 100095 China pengshuping@huawei.com

Applications and Real-Time Working Group AI Agent, Multi-Agent Collaboration, Structured Data, Schema This document defines a structured data schema interaction protocol for multi-agent collaboration. As AI agents increasingly interoperate across heterogeneous platforms, natural-language-based communication suffers from semantic drift, high inference overhead, and ambiguous data flow. This protocol introduces a standardized key-value schema with semantic annotations, enabling deterministic, efficient, and interoperable agent-to-agent communication. A lightweight schema negotiation mechanism is provided for initial alignment at the beginning of communication, while an optional key-value update mechanism allows agents to reflect evolving requirements without breaking existing structured data schema interaction protocol.

Introduction Recent advances in large language models (LLMs) enable AI agents to plan and execute multi-step workflows for complex tasks. Through agent communication protocols, AI agents can call third-party tools and delegate tasks to other AI agents. When an AI agent acts on behalf of a human user, these interoperations require structured, deterministic, and efficient information exchange. Today's agent ecosystems are characterized by rich, heterogeneous interaction information. Agents communicate through natural language text, structured data, and platform-specific documents. While natural language is expressive and flexible, it introduces three critical problems in multi-agent collaboration:

Semantic drift: Without structured constraints, LLM-based parsing of natural language produces inconsistent semantic interpretations across heterogeneous agents with different LLMs, leading to semantic drift and causing miscommunication.
High inference overhead: Processing free-form text requires substantial reasoning tokens and computational resources, increasing end-to-end latency and operational costs.
Interoperability barriers: Heterogeneous agents (e.g., payment agents, lifestyle service agents) lack a unified model alignment mechanism, making cross-system communication fragile and integration-costly.

This document introduces a structured data schema protocol, providing explicit semantic definitions with fixed data keys. The benefits of this approach are:

Deterministic semantic alignment: By pre-defining key-value schemas with explicit semantic descriptions, client agents can parse user intent into structured payloads with minimal ambiguity, effectively suppressing semantic drift.
Reduced token consumption and response latency: Structured communication eliminates the need for server-side agents to perform open-ended natural language understanding.
Enhanced interoperability and decoupling: A standardized key-value format allows client agents to adapt dynamically to server agent interface changes, enabling massive heterogeneous agent onboarding without per-integration custom parsing logic.

This protocol is designed to complement existing agent protocols (e.g., A2A , MCP ) by defining the data-mode contract. It does not mandate transport and authorization mechanisms; those may be provided by underlying protocols. Existing structured data exchange protocols such as JMAP Sharing demonstrate that standardized key-value data models with explicit sharing semantics can enable scalable cross-system interoperability, providing a design precedent for agent-to-agent schema negotiation.

Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Terminology

AI Agent: An entity with built-in intelligence, which can perform actions to accomplish tasks, possibly on behalf of an end-user or another agent.
Client Agent: The agent that initiates a structured interaction request, typically acting on behalf of an end-user.
Server Agent: The agent that exposes structured capabilities and responds to client agent requests with key-value formatted data.
Data Schema (Schema): An LLM-readable template that defines the set of keys, value types, and semantic descriptions required for a specific interaction scenario.
Semantic Annotation: A human/LLM-readable description associated with a schema key, explaining its meaning, expected value range, and key semantics.
Structured Communication: The practice of exchanging information strictly within the boundaries of an agreed-upon data schema, as opposed to free-form natural language.
Schema Negotiation: The process by which a client agent discovers and obtains the data schema and semantic annotations from a server agent prior to sending structured payload.
Schema Update: An OPTIONAL mechanism by which a server agent refines its published schema based on observed dynamic usage patterns, returning updated new key definitions to the client agent.

Protocol Overview The Structured Data Schema Interaction Protocol operates in three phases: Schema Negotiation, Structured Data Schema Exchange, and (optionally) Schema Update. The protocol assumes that transport-level connectivity (e.g., HTTP, JSON-RPC, or protocol-specific channels) has already been established by underlying agent communication frameworks.

Interaction Model The protocol adopts a client-server interaction model between two cooperating agents:

Client Agent (C): The agent that requires a service. It is responsible for (1) discovering the server agent's schema, (2) mapping user intent or upstream data into the negotiated key-value structure, and (3) transmitting the structured payload.
Server Agent (S): The agent that exposes a capability. It is responsible for (1) publishing its key-value schema and semantic annotations, (2) validating incoming structured payloads against the schema, (3) executing the requested capability, and (4) optionally returning schema refinement suggestions.

The interaction is stateless from the protocol perspective; any session state MUST be managed by the application layer or underlying transport protocol.

Message Flow Figure 1 illustrates the complete message flow of a structured key-value interaction.

Target Server Agent: Which remote agent can fulfill the request (e.g., an airline service agent).
Interaction Scenario: The specific capability or workflow category within the target agent (e.g., "flight_booking", "photo_retouch").

This step MAY leverage an agent registry, capability directory, or historical context to disambiguate between multiple candidate server agents. If no suitable server agent is found, the client agent SHOULD report the failure to the end user. Step (3) Schema Negotiation: The client agent requests the server agent's schema template and semantic descriptions for a given interaction scenario (e.g., flight booking, photo editing). The server agent returns a JSON object containing the schema keys, value types, and semantic annotations. Step (4) Structured Exchange: The client agent transmits the populated key-value payload to the server agent. All REQUIRED keys MUST be present; OPTIONAL keys MAY be omitted if not applicable. Step (5) Execution Result: The server agent validates the payload, executes the requested capability, and returns a result. The result itself SHOULD be structured according to a pre-negotiated response schema when applicable. Step (6) Optional Schema Update: If the server agent observes recurring patterns in the "other" key or receives explicit capability extension requests, it MAY return a schema update suggestion containing new keys or refined semantic descriptions. This step is OPTIONAL and serves as a lightweight feedback loop for long-term protocol refinement.

Structured Key-Value Format Structured data formats such as JSContact demonstrate that explicitly typed keys with semantic annotations enable reliable machine parsing. This protocol applies the same principle to agent-to-agent communication: every key in the schema template carries a declared type and a semantic description that constrains the client agent's generation space. This section defines the syntax and semantics of the structured key-value interaction format.

Schema Template A schema template is a JSON object that declares the expected structure of an interaction payload. It MUST contain the following top-level members:

schema_id: REQUIRED. A string that uniquely identifies the schema version within the server agent's namespace.
scenario: REQUIRED. A human-readable string describing the interaction scenario (e.g., "flight booking", "photo_retouch").
keys: REQUIRED. An array of key definition objects, each describing a single key expected in the payload.

Each key definition object MUST contain:

key_name: REQUIRED. A string identifier for the key, using snake_case convention. Key names MUST be unique within the schema.
key_type: REQUIRED. A JSON Schema type identifier (e.g., "string", "integer", "boolean", "array", "object").
semantic_description: REQUIRED. A human-readable and machine-processable string explaining the business meaning of the key, acceptable value enumerations, and mapping examples from natural language. This description serves as a prompt-level anchor for the client agent's LLM, constraining the generation space and reducing hallucinated key mappings. Server agents SHOULD maintain semantic descriptions in the same language as the expected end-user queries, or provide multilingual annotations when serving cross-lingual client agents.
required: REQUIRED. A boolean indicating whether the key MUST be present in the payload.
default_value: OPTIONAL. A default value to be used when the key is omitted and not required. If absent and the key is optional, the server agent MUST apply its own default logic or ignore the key.

Figure 2 shows an example schema template for a flight booking purchase scenario.

Example Schema Template for Flight Booking 'PEK'." }, { "key_name": "destination", "key_type": "string", "required": true, "default_value": null, "semantic_description": "Arrival city or airport code. Acceptable values: IATA airport codes or city names. Example mapping: 'to Shanghai' -> 'SHA'." }, { "key_name": "departure_date", "key_type": "string", "required": true, "default_value": null, "semantic_description": "Date of departure in ISO 8601 format (YYYY-MM-DD). Example mapping: 'next Monday' -> '2026-05-04'." }, { "key_name": "cabin_class", "key_type": "string", "required": false, "default_value": "economy", "semantic_description": "Cabin class preference. Acceptable values: economy, premium_economy, business, first. Example mapping: 'business class' -> 'business'." }, { "key_name": "passenger_count", "key_type": "integer", "required": false, "default_value": 1, "semantic_description": "Number of passengers. Range: 1-9. Example mapping: 'two people' -> 2." }, { "key_name": "other", "key_type": "string", "required": false, "default_value": null, "semantic_description": "Escape valve for unstructured semantic fragments that cannot be mapped to existing keys. Example mapping: 'window seat please' -> 'window seat'." } ] } ]]>

Semantic Annotation Semantic annotations provide the contextual anchor that enables client agent LLMs to perform accurate intent-to-key mapping. Each key definition in the schema template MUST be augmented with a "semantic_description" field, as required in Section 4.1. The semantic description serves as a prompt-level anchor for the client agent's LLM, constraining the generation space and reducing hallucinated key mappings. Figure 3 extends the flight booking example with a focused semantic annotation.

Semantic Annotation Example 'business'." } ]]> Server agents SHOULD maintain semantic descriptions in the same language as the expected end-user queries, or provide multilingual annotations when serving cross-lingual client agents.

The "other" Key The key named "other" is reserved within every schema template as an escape valve for unstructured semantic fragments that cannot be mapped to existing keys. Its usage is subject to the following rules:

The "other" key MUST be defined as OPTIONAL (required: false).
The value of the "other" key MUST be a string or an array of strings containing natural language descriptions of unmapped user intent.
Server agents MUST accept payloads containing the "other" key without rejecting the request, even if the key contains information outside the current schema scope.
Client agents SHOULD minimize the use of the "other" key by leveraging LLM-based parsing to exhaust existing keys before falling back to "other".

The presence of meaningful content in the "other" key signals a potential schema coverage gap. Server agents MAY use this signal as input to the OPTIONAL schema update mechanism described in Section 6.2.

Illustrations of the protocol This section illustrates the application of the structured key-value interaction protocol.

Flight Booking Scenario: An end user asks a personal AI assistant (client agent) to book a flight through an airline service agent (server agent). User utterance: "Book me a flight from Beijing to Shanghai next Monday, business class, and I prefer a window seat." Schema negotiation: The client agent requests the flight booking schema from the server agent. The server agent returns the schema template shown in , augmented with semantic annotations. Intent parsing and mapping:

"Beijing" -> origin: "PEK"
"Shanghai" -> destination: "SHA"
"next Monday" -> departure_date: "2026-05-04"
"business class" -> cabin_class: "business"
(implicit) -> passenger_count: 1
"window seat" -> other: "window seat"

Structured payload:

Structured Payload for Flight Booking Execution: The server agent validates the payload, queries the flight inventory, and returns a booking confirmation with flight number, departure time, and seat assignment. Optional schema update: If the server agent observes frequent "other" entries mentioning seat preferences, it MAY return a schema update suggestion adding a "seat_preference" key with acceptable values "window", "aisle", or "none".

Photo Editing Scenario: An end user asks a personal AI assistant to retouch a photo through a cloud-based image editing agent. User utterance: "Please retouch this photo: smooth the skin, whiten teeth, make the background blurry, and add a vintage filter. Also, I want my eyes to look bigger." Schema negotiation: The client agent discovers the photo editing schema from the server agent. Example keys include: skin_smoothing (integer 0-10), teeth_whitening (boolean), background_blur (boolean), filter_style (string), eye_enlargement (boolean). Intent parsing and mapping:

"smooth the skin" -> skin_smoothing: 7
"whiten teeth" -> teeth_whitening: true
"make the background blurry" -> background_blur: true
"add a vintage filter" -> filter_style: "vintage"
"eyes to look bigger" -> other: "increase eye size proportionally"

Structured payload:

Structured Payload for Photo Editing Note: The client agent mapped "eyes to look bigger" to the "other" key because the current schema does not define a granular eye_size adjustment key (only a boolean eye_enlargement). The server agent MAY later propose a schema update introducing "eye_size_scale" (float, 1.0-1.5) based on aggregated "other" patterns.

Capability Enhancement This section describes OPTIONAL mechanisms that enhance the core structured interaction protocol. Implementations MAY support none, some, or all of these capabilities. They are designed to be transparent to agents that do not implement them.

LLM-based Semantic Parsing The protocol assumes that client agents employ an internal LLM to perform natural language understanding (NLU) and map user intent to schema keys. The quality of this mapping directly affects the correctness of the structured payload. Recommended practices for LLM-based semantic parsing include:

In-context learning: Provide the LLM with the schema template and semantic descriptions as part of the prompt context, enabling few-shot or zero-shot key extraction.
Constrained decoding: Where supported, use constrained generation or grammar-based decoding to ensure that the LLM output conforms to the expected key-value JSON structure.
Confidence thresholding: If the LLM's confidence in mapping a semantic fragment to a specific key falls below a configurable threshold, the fragment SHOULD be routed to the "other" key rather than risk a hallucinated mapping.

Server agents are NOT REQUIRED to perform NLU; their role is to validate and execute structured payloads. This separation of concerns reduces server-side inference costs and ensures deterministic execution.

Key-Value Schema Self-Evolution The schema self-evolution mechanism provides a dynamic, backward-compatible path for server agents to refine their published schemas based on operational feedback and differential semantics extracted from the "other" key. It is OPTIONAL and does not alter the core structured exchange semantics. The mechanism consists of three coordinated components: (1) a trigger mechanism based on a differential semantic pool, (2) a schema self-evolution algorithm that generates dynamic patches, and (3) a long-term evaluation framework that ensures convergence and prevents schema bloat.

Differential Semantic Pool and Trigger Mechanism To avoid erroneous evolution caused by sporadic anomalies and to ensure that genuine common or personalized needs are not missed, server agents MAY maintain a Differential Semantic Pool (DSP). The DSP collects natural-language fragments from the "other" key that could not be mapped to existing schema keys. Its operation follows three stages:

Semantic Vector Clustering: Each fragment extracted from the "other" key MUST be encoded into a semantic vector. Server agents SHOULD use dense vector embeddings (e.g., sentence-transformer-based) to represent semantic meaning. Vectors are clustered using similarity thresholds (e.g., cosine similarity >= 0.85). Each cluster represents a distinct, recurring semantic intent that is not covered by the current schema.
Heat Decay Counting: Each cluster maintains a heat score that reflects the intensity of the corresponding semantic demand. When a new fragment joins a cluster, its heat score increases by a fixed increment. The heat score MUST decay over time (e.g., exponential decay with a half-life of 24 hours) to prevent obsolete or transient patterns from accumulating undue weight.
Threshold-Based Triggering: A schema evolution trigger fires when a cluster's heat score exceeds a configurable evolution threshold (e.g., 50 accumulated heat units) AND the cluster contains a minimum number of distinct source requests (e.g., >= 5 unique client agents or >= 10 total occurrences within a 7-day window). This dual-gate design ensures that evolution is triggered only when the same semantic appeal has gathered sufficient strength across multiple interactions, filtering out accidental outliers while capturing genuine collective needs.

Server agents MAY maintain separate DSP instances per scenario or per client-agent cohort, enabling both global schema evolution and personalized schema branching.

Schema Self-Evolution Algorithm Once the trigger mechanism identifies a mature semantic cluster, the server agent executes a self-evolution algorithm to synthesize a schema patch from the cluster's natural-language content.

Base Schema: The immutable, versioned schema template (see Section 4.1) that defines the stable contract. Base schema keys MUST NOT be removed or have their types changed by dynamic patches; only additive or semantic-description refinements are permitted in the patch layer.
Dynamic Patch Generation: For each triggered semantic cluster, the server agent MUST perform intent induction, conflict detection, and patch packaging.
Online Update Delivery: The server agent MAY deliver active patches through inline suggestions in the schema_update_suggestion field of the execution result response or through a dedicated schema polling endpoint.
Personalized Adaptation: Server agents MAY maintain per-client patch stacks. When a specific client agent repeatedly submits personalized requests that fall into a unique semantic cluster (e.g., a frequent business traveller who always requests "extra legroom" and "quiet cabin"), the server agent MAY generate a client-specific patch that adds keys such as "seat_preference" and "cabin_zone" only for that client. This transforms standardized task flows into privately customized service responses without polluting the global schema.

Intent Induction: Use an internal LLM or rule-based extractor to summarize the cluster's natural-language fragments into a candidate key_name, key_type, required flag, default_value, and semantic_description. The semantic_description MUST include mapping examples derived from real fragments in the cluster.
Conflict Detection: Before finalizing the patch, the server agent MUST check that the candidate key_name does not collide with existing keys in the base schema or active patches. If a collision occurs, the server agent SHOULD merge the candidate into the existing key by refining its semantic_description rather than creating a duplicate.
Patch Packaging: The approved candidate is packaged as a schema patch containing new_keys, modified_keys, or both. The patch MUST carry a patch_id, a parent_schema_id, a timestamp, and an expiration date.

Long-Term Evaluation Mechanism To prevent schema bloat and the accumulation of erroneous keys, server agents that implement self-evolution MUST employ a long-term evaluation mechanism that continuously assesses the quality and necessity of every patch key. Evaluation Metrics: For each key introduced through a dynamic patch, the server agent SHOULD track at minimum the following metrics over a configurable observation window (default 30 days):

usage_frequency: The ratio of payloads in which the key is present to total payloads for the scenario.
semantic_alignment_accuracy: The ratio of payloads where the key's value matches the intent expressed in the original natural-language request, as judged by an internal LLM or manual audit sampler.
value_type_correctness: The ratio of payloads where the received value conforms to the declared key_type.
client_adoption_rate: The ratio of distinct client agents that have successfully adopted the key to those that received the patch.

Two-Phase Lifecycle: All new keys introduced via dynamic patches MUST begin in an experimental state and transition through two phases.

Trial Period and Stabilization Decision: The key remains experimental for a configurable duration (default 7 days). During this period, the server agent collects the metrics above. The key MUST be advertised with an "experimental: true" flag in the patch so that client agents know the key is provisional. Upon completion of the trial period, the server agent evaluates the aggregated metrics.
Schema Convergence Guard: The server agent MUST enforce a maximum limit on the number of active experimental keys per scenario (e.g., no more than 10). When the limit is reached, new triggers MUST be queued until existing experimental keys are either promoted or deprecated. This guard prevents runaway schema bloat and ensures that the protocol converges to a stable, high-signal key set over time.

Promotion: If usage_frequency >= 15%, semantic_alignment_accuracy >= 80%, and value_type_correctness >= 90%, the key is promoted to stable. The "experimental" flag is removed, and the key becomes part of the long-term supported schema.
Deprecation: If usage_frequency < 5% OR semantic_alignment_accuracy < 60% OR value_type_correctness < 70%, the key is marked deprecated. Deprecated keys remain in the schema for a grace period (default 14 days) with a "deprecated: true" flag, after which they MAY be moved to a withdrawn state.
Withdrawal: A withdrawn key is removed from active patches. Server agents MUST still accept the key in incoming payloads for an additional backward-compatibility window (default 30 days) to avoid breaking legacy client agents, but SHOULD log a warning.

Security Considerations Structured key-value payloads may contain sensitive personal information (e.g., dietary preferences, biometric retouching parameters, location data). Implementations MUST protect this data in transit and at rest using mechanisms appropriate to their threat model. Schema negotiation and update messages MUST be integrity-protected to prevent man-in-the-middle attacks that could inject malicious keys or semantic descriptions designed to exfiltrate data or trigger unauthorized actions. When the "other" key contains free-form natural language, server agents MUST apply the same input validation and sanitization practices as they would for any natural language input, preventing prompt injection or command injection attacks. The OPTIONAL schema update mechanism MUST require authentication and authorization if it exposes new capabilities or modifies security-relevant keys (e.g., keys related to payment, identity, or access control).

IANA Considerations This document has no IANA actions.

References Normative References Informative References Agent2Agent(A2A) Protocol Google Model Context Protocol (MCP) Anthropic

Example Messages This appendix provides complete, non-normative examples of schema negotiation, structured payload exchange, and optional schema update messages.

Schema Negotiation Request and Response Client agent request:

Schema Negotiation Request Server agent response:

Schema Negotiation Response 'PEK'."}, {"key_name": "destination", "key_type": "string", "required": true, "semantic_description": "Arrival city or airport code. Example: 'to Shanghai' -> 'SHA'."}, {"key_name": "departure_date", "key_type": "string", "required": true, "semantic_description": "Departure date in ISO 8601 format (YYYY-MM-DD). Example: 'next Monday' -> '2026-05-04'."}, {"key_name": "cabin_class", "key_type": "string", "required": false, "default_value": "economy", "semantic_description": "Cabin class. Acceptable values: economy, premium_economy, business, first. Example: 'business class' -> 'business'."}, {"key_name": "passenger_count", "key_type": "integer", "required": false, "default_value": 1, "semantic_description": "Number of passengers. Range: 1-9. Example: 'two people' -> 2."}, {"key_name": "other", "key_type": "string", "required": false, "semantic_description": "Natural-language descriptions that cannot be mapped to existing keys. Example: 'window seat' -> 'window seat'."} ] } ]]>

Structured Payload Exchange Client agent request payload:

Client Agent Request Payload Server agent response:

Server Agent Response

Photo Editing Schema Example

Photo Editing Schema Example 3."}, {"key_name": "teeth_whitening", "key_type": "boolean", "required": false, "default_value": false, "semantic_description": "Whether teeth whitening is enabled. Example: 'make the teeth whiter' -> true."}, {"key_name": "background_blur", "key_type": "boolean", "required": false, "default_value": false, "semantic_description": "Whether background blur is enabled. Example: 'blur the background' -> true."}, {"key_name": "filter_style", "key_type": "string", "required": false, "default_value": "none", "semantic_description": "Filter style. Acceptable values: none, vintage, cinematic, warm, cool. Example: 'vintage style' -> 'vintage'."}, {"key_name": "eye_enlargement", "key_type": "boolean", "required": false, "default_value": false, "semantic_description": "Whether eye enlargement is enabled. Note: this key is a boolean switch; use other for fine-grained adjustment requests."}, {"key_name": "other", "key_type": "string", "required": false, "semantic_description": "Photo-editing requirements that cannot be mapped to existing keys. Example: 'make the eyes a little bigger' -> 'increase eye size proportionally'."} ] } ]]>