An Architectural Framework for Monitoring Packet Loss Caused by Network Congestion

Internet-Draft	An Architectural Framework for Monitorin	June 2026
He, et al.	Expires 7 December 2026	[Page]

Abstract

Network congestion can lead to performance degradation and increase uncertainty in service delivery, so real-time congestion monitoring is necessary. This document describes a comprehensive packet loss monitoring architectural framework. The proposed scheme is capable to not only determine the time and location of packet loss occurrence, make the accurate statistics of discarded packets, parse what traffic flows are contained in discarded packets and identify what traffic flows lead to microburst, but also obtain accurate packet loss ratio results. More importantly, the proposed scheme can achieve little or even no interference to network, and is applicable to any data plane without modifying the forwarding chip and packet header as existing measurement methods do.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 7 December 2026.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

1. Introduction

With the large-scale deployment of 5G networks, emerging services including enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low Latency Communication (uRLLC) have imposed stringent requirements on IP bearer network performance, demanding significantly reduced latency, minimized jitter, and near-zero packet loss rates [_GPP_TS_22.261]. At the same time, the technical development of Big Data and Artificial Intelligence (AI) calls for intelligent computing network infrastructure whose goal is to construct a lossless network characterized by "high throughput, low latency, and zero packet loss" [Adithya_Gangidi24][Kun_Qian24]. However, the inherent statistical multiplexing nature of TCP/IP-based IP networks results in bursty traffic patterns, making network congestion an inevitable occurrence. Such congestion phenomena degrade network performance and introduce the uncertainty in service delivery, e.g., loss leads to packet retransmission, increasing delay leads to decreasing throughput. For a long time, numerous studies have been concentrated on congestion control mechanisms and related algorithms [RFC9293][RFC9743] to improve network performance.¶

Network congestion is roughly divided into two classes: long-lived congestion and short-lived congestion. A long lived congestion is generally caused by persistent traffic growth, e.g., congestion duration ranging from hours to days, which is easy to be observed through Network Management System/Element Management System (NMS/EMS). However, a short-lived congestion is almost caused by traffic bursts, among which microburst is one of the major contributors. Microburst is a phenomenon where a device port receives a considerable amount of burst data in a very short time (i.e., milliseconds, even microseconds), resulting in an instantaneous burst rate much higher than the average rate, even exceeding the port bandwidth [Microburst][Shuhei_Yoshida21]. A microburst is prone to packet loss but difficult to detect in time. Many investigations prove that microburst is the main culprit affecting latency-sensitive and packet loss-sensitive services. When a microburst occurs, the queuing time increases rapidly, and in severe case, packet loss may even occur, which are intolerable for applications like Virtual reality (VR).¶

In order to reduce uncertain service delivery caused by network congestion, it is essential to monitor congestion-induced packet loss in real time so that network operators can quickly locate the congested nodes and links, and then make path optimization for the affected traffic flows to avoid congestion; and evaluate network congestion level so as to provide the guidance for network planning, capacity expansion and optimization.¶

[I-D.he-ippm-congestion-loss-monitoring-problem] discusses the requirements of real-time monitoring of packet loss caused by congestion, presents the problems and challenges faced by existing monitoring and measurement techniques in real-time monitoring of congestion-induced packet loss. This document describes an architectural framework for real-time monitoring of congestion-induced packet loss.¶

2. Conventions

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

2.2. Terminology

Abbreviations used in this document:¶

AI: Artificial Intelligence¶

CLI: Command Line Interface¶

CPU: Central Processing Unit¶

MPLS: Multi-Protocol Label Switching¶

NTP: Network Time Protocol¶

PLR: packet loss ratio¶

SNMP: Simple Network Management Protocol?¶

SLA: Service Level Agreement¶

SLO: Service Level Objective¶

SRv6: Segment Routing over IPv6¶

VPN: Virtual Private Network¶

3. Architectural Framework for Real-time Monitoring of Packet Loss Caused by Congestion

To monitor congestion-induced packet loss effectively, this document proposes a comprehensive packet loss monitoring architectural framework [Xioaming_He25]. The proposed framework is mainly composed of network devices and the collection and analysis system. All network devices need to report loss events caused by congestion, and also cache the discarded packets due to queue overflow and upload them to the collection and analysis system in real-time manner. Telemetry interface (e.g., YANG Push [RFC8641], gRPC [gRPC]) with subscription mechanism is used to push loss data immediately when the loss event occurs, avoiding the inefficiency of the traditional SNMP polling mode. The collection and analysis system is required to count the total number of the discarded packets reported, parse the service types of discarded packets, count the number of the discarded packets for every traffic flow contained in all loss events, and calculate packet loss ratio (PLR) of the specified user flow, etc. Furthermore, the real-time visibility of packet loss gained from the collection and analysis system can feed into NMS/EMS so that network operators can quickly pinpoint the congested nodes and the affected traffic flows. Also, with the injection of such a real-time visibility of packet loss, the network controller can make timely path optimization for the affected traffic flows sensitive to latency and loss to improve user Quality of Experience (QoE). Figure 1 illustrates the proposed framework for monitoring packet loss caused by congestion.¶

 +-------------+        +--------------------------+        +-------------+
 | Network     |<-------| Collection and Analysis  |------->| NMS/EMS     |
 | controller  |        | system                   |        |             |
 +-------------+        +-------------^------------+        +-------------+
       |                              |
       | Timely path optimization     |                Rapid troubleshooting
       | based on real-time           |                based on real-time
       | loss visibility              |                loss visibility
       |                    Packet loss data reporting
 +-----v--^-------------^-------------^-------------^-------------^-------+
 |        |             |             |             |             |       |
 |        |             |             |             |             |       |
 |    +---+--+      +---+--+      +---+--+      +---+--+      +---+--+    |
 |    | Node |------| Node |------| Node |------| Node |------| Node |    |
 |    +------+      +------+      +------+      +------+      +------+    |
 |                               IP Network                               |
 +------------------------------------------------------------------------+

Figure 1: Framework for monitoring packet loss caused by congestion

3.1. Network Devices

In IP networks, network devices such as router and switch are mainly used to implement packet forwarding. Traditional network devices can only record the number of discarded packets by port or queue overflow, and no loss information is notified promptly when packet loss occurs. The operator can only log on the device (e.g., through CLI) to search for loss event. Network devices need to have the ability to detect congestion and packet loss in real time. The traditional query using CPU on main control engine consumes much processing resources, and the network device must leverage built-in dedicated hardware to detect packet loss in real time. On the other hand, existing forwarding devices do not cache the packets overflowed by queue, but simply drop them, hence it are not clear what packets were dropped, and what traffic flows contributed to congestion or microburst. In order to capture the traffic flows related to the packet loss, a cache for the discarded packets is needed. The proposed in-device packet loss detection architecture is shown in Figure 2.¶

+------------------------------------------------------------------+
|                             Network device                       |
|                                                                  |
|  +---------------------------+    +--------------------------+   |
|  | Real-time packet loss     |--->| packet loss information  |   |
|  | detection module          |    | reporting module         |   |
|  +-------------|-------------+    +--------------------------+   |
|                |                                                 |
|                v                                                 |
|  +---------------------------+        +-----+                    |
|  | packet loss counter       |<-------|queue|  port1             |
|  +---------------------------+        +-----+                    |
|                                       +-----+                    |
|  +---------------------------+        |queue|  port2             |
|  |Cache module for discarded |<-------+-----+                    |
|  | Packets                   |        +-----+                    |
|  +-------------|-------------+        |queue|  port3             |
|                |                      +-----+                    |
|                v                         :                       |
|  +---------------------------+        +-----+                    |
|  | packet loss file Upload   |        |queue|  portN             |
|  | module                    |        +-----+                    |
|  +---------------------------+        +-----+                    |
|                                       |queue|  portM             |
|                                       +-----+                    |
+------------------------------------------------------------------+

Figure 2: In-Device Packet Loss Detection Architecture

The in-device packet loss Detection architecture is required to add four new functional modules, which are described as follows.¶

Real-time packet loss detection module: Leverage the built-in dedicated hardware to query the queue overflow packet loss counter of every port at millisecond interval; also, records the location and time of loss occurrence.¶
Packet loss information reporting module: Sends loss information according to subscription request, including the number of discarded packets, the timestamp of loss occurrence, the localization of packet loss such as device ID, port ID and queue ID.¶
Cache module for discarded packets: Caches packets dropped by queue overflow, and optionally, records the number of discarded packets, the time of loss occurrence, the localization of packet loss such as device ID, port ID and queue ID. Only one shared cache is needed for all ports and queues. In order to save buffer space, the cached packets should be cleaned immediately after uploading.¶
Packet loss file upload module: Packages the cached discarded packets as a file or compressed file and uploads it to the collection and analysis system according to the specified rule.¶

3.1.1. Cache for Discarded Packets

To analyze packet drops caused by queue overflow, implementing a cache mechanism is essential for capturing discarded packets. However, since packet parsing and statistical analysis consume significant local resources (such as memory and computing power), these tasks are more suitable for being handled by a remote central processing entity. since packet headers typically contain all necessary service type and flow attribute information, truncating discarded packets to a fixed length (e.g., the first 64 bytes) provides sufficient data for analysis while dramatically reducing cache need.¶

In the process of uploading packet loss file and cleaning the discarded packets, any loss event may happen to occur, leading to no buffer available for the subsequent dropped packets. In order to avoid this situation, the cache should be divided into two separate spaces in appropriate proportion: primary space and spare space. The primary space is used to cache the discarded packets for uploading each time, and the spare space is used to cache subsequent discarded packets during the current packet packaging and uploading operation.¶

3.1.2. Packet Loss File Upload

In order to support the real-time uploading of packet loss file, file transfer protocol such as Trivial File Transfer Protocol (TFTP) [RFC1350] should to be used for transferring the file immediately when the loss file is available. To minimize cache capacity, a smart uploading scheme for packet loss file is proposed, which is described as follows.¶

S1 If there is no discarded packet in any cache, no packet loss file will be uploaded to minimize processing resources.¶

S2 If there exist some discarded packets in any cache, including the primary space or the spare space, and neither space reaches the utilization threshold (e.g., 90%), the packet loss file is uploaded according to the preset fixed cycle (e.g., 10s) that needs to meet the real-time requirements for packet parsing and statistics.¶

S3 Else, when either space reaches the utilization threshold due to considerable dropped packets, the packet loss file is uploaded immediately without waiting for the next uploading cycle.¶

3.1.3. Telemetry Data Collection and Report

The local device is also required to collect real-time loss data caused by congestion. In order to capture loss event in real time, the network device needs to leverage the built-in dedicated hardware such as Application Specific Integrated Circuit (ASIC) to read the packet loss counter of each port or queue at millisecond interval, and send telemetry data about loss information according to subscription request. In order to improve the real-time awareness of packet loss in some scenarios such as traffic optimization and congestion discovery, the on-change update (compared to periodic update) is more preferable, that is, a telemetry update is sent immediately when packet loss counter value changes. While supporting on-change update, a dampening period should be configurable to minimize the amount of data sent.¶

On the other hand, in order to measure Packet Loss Ratio (PLR) caused by congestion, the network device is required to collect the statistical data of the monitored traffic flows and send the corresponding telemetry data to the collection and analysis system periodically. The ingress device, such as access router and Provider Edge router (PE), is required to configure the receiving packet counter for the monitored traffic. The specified traffic flows may be identified by Layer 2 flows (e.g., based on source and/or destination Media Access Control (MAC) address, Virtual Local Area Network Identifier (VLAN ID), Virtual eXtensible Local Area Network Identifier (VxLAN VNI)), Layer 3 flows (e.g., identified by N-tuple, and/or Flow Label field of IPv6 packet header), Layer 2/3 VPN ID carried in SR-MPLS label stack or IPv6 Segment Routing Header (SRH), etc.¶

3.1.4. Time Synchronization

The global time synchronization is also needed for the accurate calculation of PLR measurement. For instance, when the ingress device periodically reports the received VPN traffic statistical data (packet counter value) with the timestamp in telemetry data, and during some report period, this specified VPN traffic has happened to encounter packet loss caused by a microburst, and the loss information is immediately reported carrying the timestamp of loss occurrence. Figure 3 depicts the timing relationship between the time of telemetry data of the specified traffic reported and that of loss occurrence reported.¶

  |   Report period   |   Report period   |
--|-------------------|---------|---------|--------> Synchronization time
  ^                   ^         ^         ^
  |                   |         |         |
  |                   |         |         |
                      Tp        Tl       Tc

Figure 3: Loss Occurrence and Telemetry Data Report Period Timing

Based on their respective timestamps, e.g., the timestamp Tl of loss occurrence falls between the timestamp Tp and Tc carried by the two consecutive traffic telemetry data, the collection and analysis system can correctly calculate PLR of the specified VPN traffic at that exact period.¶

The network device is required to support time synchronization techniques such as Network Time Protocol(NTP)or IEEE1588, which are widely deployed in operator's networks. Generally, NTP can meet precision of 50 ms and IEEE1588 can meet precision of microseconds. In the proposed scheme, time synchronization precision depends on measurement period. For normal measurement period of tens of seconds or even minutes, synchronization precision of 50ms(easy to implement) is enough to satisfy the measurement requirement.¶

3.2. Collection and Analysis System

The proposed framework is required to handle packet loss information, and claims higher real-time requirements. Therefore, an independent collection and analysis system is more suitable to monitor the real-time packet loss caused by congestion. The proposed structure of collection and analysis system is shown in Figure 4.¶

+--------------------------------------------------------------------------+
|                        Collection and analysis system                    |
|                                                                          |
|  +------------------+                  +-------------------------+       |
|  | PLR measurement  |<-----------------| Packet loss statistics  |       |
|  | module           |                  | module                  |       |
|  +--------^---------+                  +---^---------------^-----+       |
|           |                                |               |             |
|           |                                |               |             |
| +---------+--------------+     +----------------+    +-----------------+ |
| | Measured traffic flows |     | Packet parsing |    |Packet loss data | |
| | collection module      |     | module         |<---|collection module| |
| +------------------------+     +----------------+    +-----------------+ |
+--------------------------------------------------------------------------+

Figure 4: Internal Functional Modules of Collection and Analysis System

The proposed structure of collection and analysis system mainly consists of five functional modules, which are described as follows.¶

Packet loss data collection module: Accepts the packet loss data from network devices, including the telemetry data of loss information reported and loss files uploaded, and stores them for a specified time; records the number of discarded packets, the timestamp and location ID carried in the telemetry data every time it is reported.¶
Measured traffic flows collection module: Accepts the telemetry data of the measured traffic flows reported from network ingress devices, and stores them for a specified time; records the number of received packets and the timestamp carried in the telemetry data every time it is reported.¶
Packet parsing module: Leverages the professional packet parsing tools to make real-time resolution of discarded packets from packet loss files uploaded.¶
Packet loss statistics module: Based on packet parsing results, counts the number of discarded packets belonging to different traffic flows; Based on packet loss information reported, counts the total number of the discarded packets of each device, each port and queue, and also records the time and location of loss occurrence.¶
PLR measurement module: Based on the statistical data of the measured traffic flows reported periodically and the number of the discarded packets of the measured traffic flows, calculates PLR of the measured traffic flows according to the requirements of network operators (e.g., periodic measurement).¶

3.2.1. Packet Parsing

The discarded packets should be parsed as soon as possible to meet the real-time requirement of packet loss statistics and measurement. For the purpose of the real-time visibility of packet loss statistics as well as on-line PLR measurement, packet parsing time for the current uploaded packet loss file should be as little as possible, say, 100ms. The packet flow parsing of the discarded packets should at least include the measured traffic mentioned above, such as Layer 2/3 flows, Layer 2/3 VPN traffic, etc.¶

3.2.2. PLR Measurement

PLR measurement module can obtain the number of packets and timestamps carried in the telemetry data of the measured traffic flow from the measured traffic flows collection module. Meanwhile, it also can obtain the number of the discarded packets of the measured traffic flow and the timestamps carried in the loss information or carried in the packet loss file from packet loss statistics module. Therefore, based on the timing relationship between the timestamp carried in the telemetry data of the measured traffic flow and that of loss occurrence, as well as the number of received packets carried in the telemetry data of the measured traffic flow and the number of the discarded packets of the measured traffic flow, PLR measurement module can calculate the PLR of the measured traffic flow during a specified measurement period.¶

For example, the collection and analysis system receives the previous telemetry data of the measured traffic flow carrying the number N1 of received packets and the timestamp T1, as well as the current telemetry data carrying the number N2 of received packets and the timestamp T2. Meanwhile, it also obtains the number N3 of the discarded packets of the measured traffic flow and the timestamp T3 carried in the packet loss file. If the timestamp T3 is between timestamp T1 and T2, then the PLR of the measured traffic flow for the current measurement period (T2-T1) is accurately calculated as:¶

PLR = N3/(N2-N1) (1)¶

4. Functional Requirements for Real-time Monitoring of Packet Loss Caused by Congestion

In summary, to monitor packet loss caused by congestion in real time and obtain accurate packet loss ratio results, the proposed architectural framework needs to meet the following functional requirements.¶

[REQ-1] Network device is REQUIRED to support detecting packet loss caused by congestion at least every millisecond interval.¶

[REQ-2] Network device is REQUIRED to report packet loss events in real time, i.e., immediately upon detection. and the reported telemetry data is REQUIRED to carry the timestamp of packet loss occurrence, the number of discarded packets, and the packet loss location such as device ID, port ID, and queue ID.¶

[REQ-3] Network device is REQUIRED to support the capability to subscribe to periodic updates, e.g., to collect the statistical data of the monitored traffic flows and send the corresponding telemetry data to the collection and analysis system periodically. The subscription period shall be configurable as part of the subscription request. For periodic subscription, network device is RECOMMENDED to support the ability of redundant suppression, where a telemetry update should not be generated unless the value of the subscribed data objects has changed.¶

[REQ-4] Network device is REQUIRED to support the capability to subscribe to updates on-change, i.e., whenever values of the subscribed data objects change. For example, a telemetry update is sent immediately when queue overflow packet loss counter value changes. For on-change subscription, network device is REQUIRED to support a dampening period that needs to pass before subsequent on-change updates are sent. The dampening period should be configurable as part of the subscription request.¶

[REQ-5] Network device is REQUIRED to cache all discarded packets caused by queue overflow. For purpose of Packet loss statistics and analysis, network device is REQUIRED to record the time of packet loss occurrence, the number of discarded packets, and the packet loss location such as device ID, port ID, and queue ID. To reduce cache capacity, it is RECOMMENDED to truncate discarded packets to a fixed length (e.g., the first 64 bytes).¶

[REQ-6] Network device is REQUIRED to upload all discarded packets as a file or compressed file in real-time manner.¶

[REQ-7] Network device is REQUIRED to support time synchronization for measuring packet loss ratio caused by congestion, and time synchronization precision SHOULD be less than 50ms.¶

[REQ-8] Collection and analysis system is REQUIRED to support parsing the header of all discarded packets to determine the flow attribute of every discarded packet, count the number of discarded packets of each traffic flow in a real-time manner.¶

[REQ-9] Collection and analysis system is REQUIRED to support periodic measurement of PLR based on the total number of discarded packets divided by the total number of sent packets. Also, it is REQUIRED to support periodic measurement of PLR according to the number of the discarded packets divided by the number of the sent packets for the specified user traffic.¶

[REQ-10] Collection and analysis system is REQUIRED to support visualization of data analysis for the discarded packets in the form of tables and figures, which are easily understandable by the operators.¶

5. Use Cases

In this section we consider three typical application scenarios to demonstrate the advantages of the proposed architectural framework for real-time monitoring of packet loss caused by congestion.¶

5.1. Detecting microbursts in Real Time

Leverage real-time packet loss detection module with the built-in dedicated hardware to read the queue overflow packet loss counter of every port at millisecond interval, and record the time and location of loss occurrence. Once the loss counter value changes, the telemetry data of packet loss will be reported to the collection and analysis system, which will immediately become aware of this. Based on the packet loss statistics collected, the operator (through some on-line smart analytical tool) can correlate the number of discarded packets with time of loss occurrence, and thus determinate whether it is long-lived or short-lived congestion that causes packet loss. For instance, if the increasing number of packet loss lasts for a very short time (e.g., a few milliseconds to tens of milliseconds), it might well be a microburst causing packet loss.¶

At the same time, we can parse from loss files uploaded what traffic flows are contained in discarded packets and identify what traffic flows lead to microburst, so that we can take action to those culprits causing microburst. Therefore, the network operator can quickly pinpoint the congested node, improving the efficiency of fault diagnosis and root cause analysis. In addition, based on congestion state and trend of packet loss statistics, timely actions will be taken, e.g., redirecting the affected traffic flows to non-congested port, or making dynamic traffic adjustment to alleviate congestion, etc.¶

5.2. Congestion Evaluation

Congestion evaluation is of significant value for subsequent network planning, capacity expansion and optimization. It should be noted that the PLR is a classical indicator of reflecting network performance, but it cannot accurately reflect the network congestion level, since we do not exactly know the overall network packet loss caused by congestion. As mentioned above, existing monitoring techniques are not specially designed to monitor packet loss caused by congestion. In the proposed scheme, the PLR caused by congestion can be accurately calculated by the total number of discarded packets divided by the total number of the received packets by the network. No probe is required.¶

In addition, we can obtain the average frequency and duration parameters for short-lived congestion occurrence on entire network within a day, based on which we can evaluate the degree of traffic bursts and expand network capacity accordingly.¶

5.3. SLO Verification of User services

The PLR is also a key indicator for SLA compliance and should be verified. In the proposed scheme, by configuring the packet counters for the specified user flows received on the ingress devices and making real-time parsing of the discarded packets for them, we can measure tens of thousands of service traffic flows simultaneously. Because the proposed scheme leverages a separate entity to handle packet parsing and loss statistics, the concurrent number of measured flows is not limited by network resources (e.g., computing, storage or bandwidth). Also, the data plane does not need to be modified to adapt to different transport protocols and monitoring techniques as existing measurement methods do (e.g., Alternate-Marking method defined in [RFC9343]for IPv6, [RFC9714]for MPLS, and [RFC9947] for SRv6).¶

6. IANA Considerations

This document has no IANA actions.¶

7. Security Considerations

The congestion-induced loss monitoring system introduces additional traffic to the network. During network congestion, the monitoring system itself must not exacerbate the situation. Mechanisms such as rate limiting and traffic prioritization for congestion-related monitoring data should be considered. Also, some appropriate defense measures against Distributed Denial of Service (DDoS) attack are necessary to protect the data plane and control plane.¶

This document does not specify security mechanisms, but highlights that any solution must consider trusted boundary regarding telemetry data subscriptions, telemetry data reporting, and protection of potentially sensitive operational data. These aspects are expected to be addressed by solution proposals based on deployment requirements and threat models.¶

8. References

8.1. Normative References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8126]: Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, <https://www.rfc-editor.org/info/rfc8126>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.

8.2. Informative References

[Adithya_Gangidi24]: Gangidi, A., Miao, R., and S. Zheng, "RDMA over Ethernet for Distributed AI Training at Meta Scale", In ACM SIGCOMM 2024 Conference , 2024, <https://doi.org/10.1145/3651890.3672233>.
[I-D.he-ippm-congestion-loss-monitoring-problem]: He, X. and X. Min, "Requirements and Problem Statement for Monitoring Packet Loss Caused by Network Congestion", Work in Progress, Internet-Draft, draft-he-ippm-congestion-loss-monitoring-problem-00, 5 June 2026, <https://datatracker.ietf.org/doc/html/draft-he-ippm-congestion-loss-monitoring-problem-00>.
[Kun_Qian24]: Qian, K., Xi, Q., and J. Cao, "Alibaba HPN: A Data Center Network for Large Language Model Training", In ACM SIGCOMM 2024 Conference , 2024, <https://doi.org/10.1145/3651890.3672265>.
[Microburst]: Huawei Technologies Co., Ltd, "What is a Microburst? How to Detect a Microburst,(Nov. 2020)", 2020, <https://support.huawei.com/ enterprise/en/doc/>.
[RFC1350]: Sollins, K., "The TFTP Protocol (Revision 2)", STD 33, RFC 1350, DOI 10.17487/RFC1350, July 1992, <https://www.rfc-editor.org/info/rfc1350>.
[RFC9293]: Eddy, W., Ed., "Transmission Control Protocol (TCP)", STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, <https://www.rfc-editor.org/info/rfc9293>.
[RFC9343]: Fioccola, G., Zhou, T., Cociglio, M., Qin, F., and R. Pang, "IPv6 Application of the Alternate-Marking Method", RFC 9343, DOI 10.17487/RFC9343, December 2022, <https://www.rfc-editor.org/info/rfc9343>.
[RFC9714]: Cheng, W., Ed., Min, X., Ed., Zhou, T., Dai, J., and Y. Peleg, "Encapsulation for MPLS Performance Measurement with the Alternate-Marking Method", RFC 9714, DOI 10.17487/RFC9714, February 2025, <https://www.rfc-editor.org/info/rfc9714>.
[RFC9743]: Duke, M., Ed. and G. Fairhurst, Ed., "Specifying New Congestion Control Algorithms", BCP 133, RFC 9743, DOI 10.17487/RFC9743, March 2025, <https://www.rfc-editor.org/info/rfc9743>.
[RFC9947]: Fioccola, G., Zhou, T., Mishra, G., Wang, X., Zhang, G., and M. Cociglio, "Application of the Alternate-Marking Method to the Segment Routing Header", RFC 9947, DOI 10.17487/RFC9947, March 2026, <https://www.rfc-editor.org/info/rfc9947>.
[Shuhei_Yoshida21]: Yoshida, S., Ukon, Y., and S. Ohteru, "FPGA-based network microburst analysis system with efficient packet capturing", Journal of Optical Communications and Networking October, 2021, <https://doi.org/10.1364/JOCN.422859>.
[Xiaoming_He25]: He, X., He, Z., and W. Li, "Framework for Real-Time Monitoring of Packet Loss Caused by Network Congestion", IEEE Transactions on Network and Service Management December, 2025, <https://doi.org/10.1109/TNSM.2025.3578056>.
[_GPP_TS_22.261]: 3GPP, "Service requirements for the 5G system; Stage 1 (Release 18)", 2024, <https://www.3gpp.org/ftp/specs/archive/22 series/22.261>.

Authors' Addresses

Xiaoming He

China Telecom

Email: hexm4@chinatelecom.cn

Zijing He

South China University of Technology

Email: katehe163@163.com

Cancan Huang

China Telecom

Email: huangcanc@chinatelecom.cn