Internet-Draft | ALVC Container | October 2025 |
Tansey & Tansey | Expires 16 April 2026 | [Page] |
This document specifies the Adaptive Layered Voice Container (ALVC), a codec-agnostic framing and metadata container that enables progressive voice delivery in constrained and lossy networks. ALVC defines a Base layer that is intelligible on its own at sub-kilobit rates, and optional Enhancement layers that improve quality when additional capacity is available. The container supports store-and-forward operation, progressive enhancement, unequal error protection signaling, and receiver behavior for seamless splice-and-improve playback. ALVC does not define a new speech coding algorithm; it multiplexes existing voice coders within a layered container.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 16 April 2026.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Low-power and highly constrained links (for example LPWANs) cannot sustain traditional conversational streaming. ALVC provides a simple container for layered audio so that a small Base layer is delivered first for intelligibility, with optional Enhancements sent later. The container is transport-agnostic and can be mapped to different networks; a companion document describes a SCHC mapping for LPWAN [ALVC-SCHC].¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 ([RFC2119], [RFC8174]) when, and only when, they appear in all capitals, as shown here.¶
ALVC multiplexes one Base stream and zero or more Enhancement streams across time-aligned windows (for example 20 ms or 40 ms). Each window is a self-contained set of frames that can be played as soon as the Base frame is available. Enhancements, when present, refine the audio for the same window.¶
ALVC is codec-agnostic: the Base frame is typically produced by a very low bitrate coder (for example Codec2), while Enhancements may be produced by a higher-fidelity coder (for example LPCNet or Opus) configured to refine the same speech segment. The precise codec choices are outside the scope of this document.¶
Each ALVC frame carries structured metadata followed by codec payload. Fields:¶
The Base layer for a window MUST be decodable on its own. Enhancement layers for a window MUST refine, but MUST NOT be required for basic intelligibility. Receivers MUST render the best available layer for each window as data arrives.¶
Receivers maintain per-window state. On arrival of a Base frame for window N, the window becomes immediately playable. If one or more Enhancements for window N later arrive, the receiver SHOULD splice-in the improved audio without glitch, using a crossfade or codec-specific switch. Missing or invalid Enhancements MUST NOT block Base playback. Implementations SHOULD expose progress to the application layer, such as "Base-only", "Enhanced L1", "Enhanced L2".¶
Senders SHOULD prioritize timely delivery of upcoming Base windows to sustain continuous intelligible playback, then transmit earliest-missing Enhancements for already-playable windows. Senders MAY include parity or forward error correction and indicate this with the parity_present flag. Transport-specific scheduling (for example SCHC fragmentation or channel hopping) is out of scope here but is discussed in the SCHC mapping document [ALVC-SCHC].¶
Example timing: A clip is encoded into 20 ms windows. The Base stream averages about 0.5 kb/s. Enhancement L1 averages 1.0 kb/s. During constrained periods, only Base is delivered; when capacity improves, the sender backfills L1 for the earliest windows missing enhancement.¶
ALVC frames SHOULD be protected end-to-end using an authenticated encryption scheme. Integrity failures in Enhancement frames MUST NOT affect Base playback; such frames are discarded. Metadata should be minimized consistent with receiver needs.¶
This document does not create any new IANA registries. If a public registry of ALVC codec identifiers is later desired, it can be defined in a follow-up document.¶
Added BCP 14 requirements-language, clarified container scope and receiver behavior, added explicit field list and examples, aligned text with SCHC mapping companion, ASCII cleanup.¶