Internet-Draft | cats-req-service-segmentation | July 2025 |
Tran & Kim | Expires 2 January 2026 | [Page] |
This document discusses possible additional CATS requirements when considering service segmentation in related CATS use cases such as AR-VR and Distributed AI Inference¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 January 2026.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Service segmentation is a service deployment option that splits the service into smaller subtasks which can be executed in parallel or in sequence before the subtasks execution results are aggregated to serve the service request [draft-li-cats-task-segmentation-framework]. It is an interesting service deployment option that is widely considered to improve the performance of several services such as AR-VR or Distributed AI Inference which are also key CATS use cases [draft-ietf-cats-usecases-requirements].¶
For example, a recent 3GPP Technical Report on 6G use cases and services [TR-22870-3GPP] describes an XR rendering service that can be implemented as a sequential pipeline of subtasks, including a render engine, engine adaptation, and rendering acceleration. In contrast, an example of parallel service segmentation is parallel Machine Learning (ML) model partitioning for inference [SplitPlace], [Gillis]. Specifically, a ML model layer can be divided into multiple smaller partitions, which are executed in parallel. In both sequential and parallel segmentation cases, subtask may have multiple instances which are deployed across different computing sites.¶
This document analyzes these CATS service segmentation use case examples to discuss the impact of service segmentation deployment method on CATS system design.¶
This document re-uses the CATS component terminologies which has been defined in [draft-ietf-cats-framework]. Additional definitions related to service segmentation are:¶
Service subtask: An offering that performs only a partial funtionality of the original service. The complete functionality of the original service is achieved by aggregating the results of all its divided service subtasks. Subtask result aggregation may be performed either in parallel or sequentially.¶
Service subtask instance: When a service is segmented into multiple service subtasks, each service subtask might have multiple instances that performs the same partial functionality of the original service.¶
XR Rendering request +--------+ | Client | +---|----+ | +-------|-------+ | AR-VR(XR) | | App Platform | +-------|-------+ | Supposed Optimal combination: | RE Site 1, EA Site 3, RA site 4 | | Forwards packet in ORDER: | Site 1 -> 3 -> 4 +-----|-----+------+ +-----------------------| CATS** |C-PS |---------------------+ | Underlay | Forwarder |------+ +-------+ | | Infrastructure +-----|-----+ |C-NMA | | | | +-------+ | | +---------------+-----+---------+---------------+ | | Various network latency between different links | | | | | | | | | /-----------\ | /-----------\ | /-----------\ | | +-+-----|/----+---+----\|/----+---+----\|/----+---+----\|-----+--+ | CATS | | CATS | | CATS | | CATS | | Forwarder | | Forwarder | | Forwarder | | Forwarder | +-----|-----+ +-----|-----+ +-----|-----+ +-----|-----+ | | | | +-----|------+ +----|------+ +----|-------+ +---|--------+ |+----------+| |+---------+| |+----------+| |+----------+| || Render || || Render || || Engine || || Render || || Engine || || Engine || ||Adaptation|| ||Accelerate|| |+----------+| |+---------+| |+----------+| |+----------+| +---+---+ | Optimal | | | | Optimal | | Optimal | |C-SMA* | | | | | | | | | +---+---+ |+----------+| | | |+----------+| | | | || Engine || | | || Render || | | | ||Adaptation|| | | ||Accelerate|| | | | |+----------+| | | |+----------+| | | | | | | | | | | | | +-----|------+ +-----|-----+ +-----|------+ +-----|------+ | +----------------+---------------+----------------+-------------+ Service Service Service Service Site 1 Site 2 Site3 Site 4
Figure 1 illustrates how a CATS system should perform optimal traffic steering for an XR rendering service deployed as a sequential pipeline of subtasks, including the render engine, engine adaptation, and rendering acceleration. This example is derived from the corresponding use case in [TR-22870-3GPP]. To return the rendered XR object to the client, the XR rendering request must be processed sequentially in the specified order by the three rendering subtasks.¶
+-----+ +----------+ +-----+ | | +-------+ | | |Input|---> |Layer|--| Layer |--| Layer | Orignal ML Model | | | 1 | | 2 (L2)| | 3 (L3) | +-----+ |(L1) | +-------+ +----------+ +-----+ +-----+ +----------+ +-----+ |Split| +-------+ | | |Input|---> | L1 |--|SplitL2|--| Split L3 | ML Model Slice 1 | |\ +-----+ +-------+ +----------+ +-----+ \ Split \ +-----+ > |Split| +-------+ +----------+ | L1 |--|SplitL2|--| Split L3 | ML Model Slice 2 +-----+ +-------+ +----------+
ML Inference request +--------+ | Client | +---|----+ | +-------|-------+ *Merges output from| ML | *Divides input corresponding Slice 1 and 2 | App Platform | to Slice 1, 2 input sizes before responding +-------|-------+ to Client | | | Supposed Optimal combination: | Slice 1 Site 1, Slice 2 Site 3 | | Forwards packet in PARALLEL: | Site 1 & 3 +-----|-----+------+ +-----------------------| CATS** |C-PS |---------------------+ | Underlay | Forwarder |------+ +-------+ | | Infrastructure +-----|-----+ |C-NMA | | | | +-------+ | | +---------------+-----+---------+---------------+ | | Various network latency between different links | | | | | | | | | /-----------\ | /-----------\ | /-----------\ | | +-+-----|/----+---+----\|/----+---+----\|/----+---+----\|-----+--+ | CATS | | CATS | | CATS | | CATS | | Forwarder | | Forwarder | | Forwarder | | Forwarder | +-----|-----+ +-----|-----+ +-----|-----+ +-----|-----+ | | | | +-----|------+ +----|------+ +----|-------+ +---|--------+ |+----------+| |+---------+| |+----------+| |+----------+| || Model || || Model || || Model || || Model || || Slice1 || || Slice 1 || || Slice 2 || || Slice 2 || |+----------+| |+---------+| |+----------+| |+----------+| +---+---+ | Optimal | | | | Optimal | | | |C-SMA* | | | | | | | | | +---+---+ | | | | | | | | | +-----|------+ +-----|-----+ +-----|------+ +-----|------+ | +----------------+---------------+----------------+-------------+ Service Service Service Service Site 1 Site 2 Site3 Site 4
Figure 3 illustrates how a CATS system can perform optimal traffic steering for a machine learning (ML) inference service deployed as a parallel pipeline of subtasks, where each subtask corresponds to a vertically partitioned slice of the original ML model. Based on the ML model splitting use cases described in [SplitPlace] and [Gillis], Figure Figure 3 shows how an ML model can be vertically partitioned into slices that are executed in parallel to reduce inference response time. The input inference data from the client should be partitioned according to the input dimensions expected by each model slice. These slices then process their respective inputs in parallel, and the resulting outputs are merged to produce the final inference result, which is returned to the client.¶
In the normal CATS scenario:¶
In the Service Segmenatation CATS scenario:¶
As AR/VR and Distributed AI Inference are among the CATS-supported use cases listed in [draft-ietf-cats-usecases-requirements], the CATS system should also fully support scenarios where service segmentation is applied to these use cases.¶
This section outlines three CATS system design considerations that are not yet addressed in existing CATS WG documents, including the Problem and Requirement document ([draft-ietf-cats-usecases-requirements]), the Framework document ([draft-ietf-cats-framework]), and the Metric Definition document ([draft-ietf-cats-usecases-requirements]):¶
- Traffic Steering Objective:¶
- Traffic Steering Mechanism:¶
- CATS Metrics Aggregation:¶