Opus audio: high level format overview

Opus audio was developed by the Xiph.Org Foundation as an open and rayality-free lossy audio compression format and released in 2012. Opus was designed for a broad range of applications including real time communication and videoconferencing. The Opus codec supports both constant and variable bitrate encoding and a wide range of bitrates from 6 kbit/s to 510 kbit/s. It allows for flexible frame sizes from 2.5 ms to 60 ms and supports five sampling rates spanning from 8 kHz to 48 kHz. Opus can have up to 255 audio channels and provides channel coupling, enhancing its versatility and adaptability to different audio scenarios. One of the key advantages of Opus is low latency, allowing for smooth and responsive audio transmission in interactive applications. This is valuable for real-time communication  such as online gaming or videoconferencing.

This article provides a high level overview of Opus audio bitstream syntax. We will use Virinext Bitstream Analyzer for analyzing Opus files. You can download the evaluation version on the Download page. For license acquiring please check the Buy license page.

High level bitstream overview

At the high level OPUS audio is the sequence of packets containing compressed audio frames. Packet is a contiguous sequence of bytes which is processed as a single unit. A single packet may contain one or multiple audio frames, so long as they share a common set of parameters.

Virinext Bitstream Analyzer with opened OPUS audio file

Opus codec parameters are stored in the identification header, which also identifies a stream as Opus audio. The ID Header starts with the 8 bytes “OpusHead”. The ID Header packets are commonly stored in the decoder configuration parameters at the container level.

OPUS ID Header packet

Each frame starts with a one byte TOC(table-of-contents) header which describes the modes and configurations used in a given packet. TOC byte contains configuration number(“config”), stereo flag(“s”) and frame count code(“c”). The “config” field describes which one of 32 possible configurations of operating mode, audio bandwidth, and frame size is used in the packet. The stereo flag bit labeled “s”, signals mono or stereo is used, where 0 indicates mono and 1 indicates stereo. The two bits count code “c” describes the number of frames per packet (codes 0 to 3) as follows:

  • 0: 1 frame in the packet
  • 1: 2 frames in the packet, each with equal compressed size
  • 2: 2 frames in the packet, with different compressed sizes
  • 3: an arbitrary number of frames in the packet

After the TOC byte data of compressed frames is stored.

OPUS compressed frame

Later we will show the location of some parameters of audio in encoded Opus Audio elementary streams.

Sampling rate

The “original input sample rate” field of ID Header describes the sample rate of the original audio input before encoding. But this field does not determine the sample rate for playback of the encoded data. Opus supports various internal audio bandwidths, including 4, 6, 8, 12, and 20 kHz. Each packet in the Opus stream can have a different audio bandwidth. However, the reference decoder can decode any stream at sample rates of 8, 12, 16, 24, or 48 kHz, regardless of the audio bandwidth used during encoding.

During playback of encoded audio, Opus player select the playback sample rate according to the following procedure:

  • If the hardware supports 48 kHz playback, decode at 48 kHz,
  • else if the hardware’s highest available sample rate is a supported rate, decode at this sample rate,
  • else if the hardware’s highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,
  • else decode at 48 kHz and resample.

The “original input sample rate” field is also used by the muxer to pass the sample rate of the original input stream as metadata. This is useful when the user requires the output sample rate to match the input sample rate.

Channels count

The number of output channels is described by the “channel count” field of the ID Header. This can be different than the number of encoded channels, which can change on a packet-by-packet basis and described by the “c” field of the TOC header.

Channels layout

The “channel map” field in ID Header indicates the order and meaning of the encoded channels. Currently, there are three defined mapping families, but new families can be added in the future:

  • Family 0 (RTP mapping): Allowed channel numbers are 1 or 2. For mono, there is one channel (monophonic), and for stereo, there are two channels (left and right).
  • Family 1 (Vorbis channel order): Allowed channel numbers range from 1 to 8. The meanings of the channels depend on the number of channels and follow the Vorbis mapping specifications.
  • Family 255 (no defined channel meaning): Allowed channel numbers range from 1 to 255. In this family, the channels are unidentified.

The channel mapping families from 2 to 254 are reserved.

Conclusion

Opus audio has gained popularity over the years and is now recognized as a reliable and efficient audio compression format. When it comes to analyzing OPUS audio files, the Virinext Bitstream Analyzer can be a valuable tool for you. This is tool offers analysis for various encoding standards. Whether you need in-depth analysis or a high-level overview, the Virinext Bitstream Analyzer is a right tool to meet your requirements.