AVC video compression standard was published in 2003. After 17 years in 2020 it is still one of the most widely used video compression standards. This article is a high level overview of AVC (H264) bitstream syntax. In the article we will describe main syntax elements of AVC video. To analyze AVC video we will use Virinext Bitstream Analyzer. You can download the evaluation version on the download page. For license acquiring please check the Buy license page.
High level bitstream overview
At the high level the AVC video is the sequence of NAL Units (Network Abstraction Layer Units). The first byte of NAL Unit is a header byte that contains an indication of the NAL unit type. There are following types of NAL Units:
- Sequence Parameter Set and Picture Parameter Set – contain the basic video coding parameters;
- Slice – stores coded frame or part of frame;
- Supplemental Enhancement Information messages – contain different type of metadata;
- Access unit delimiter – notifies about the ending of current frame or field;
- End Of Stream – notifies about the end of coded stream;
- End Of Sequence – notifies that following NAL Unit can be decoded independently from all previous NAL Units;
- Filler Data – contains padding bytes for bitrate smoothing.
In the simplest case the AVC video can consist of only one SPS, one PPS that describes stream parameters and few Slices which stores compressed frames.
AVC specification doesn’t describe the format of storing sequence NAL Units in byte-stream. There are two main ways of it:
- Annex B part of specification describes the way of storing NAL Units with startcode prefix (0x000001). This prefix startcode is added to every NAL Unit. All bytes sequences 0x000000 are replaced by 0x00000300 because startcode bytes can be located in the middle of NAL Unit;
- AVCC code stream format in which every NAL Unit is prefixed by size of this NAL Unit that is stored in a fixed number of bytes. In this approach the SPS and PPS are stored separately.
Sequence Parameter Set and Picture Parameter Set
Sequence Parameter Set contains coding parameters which apply to a series of consecutive coded video pictures. These parameters includes profile and level, frame size, maximum GOP size, etc.
Picture Parameter Set contains coding parameters which apply to the decoding of one or more individual pictures within a coded video sequence. These parameters describes entropy coding method, deblock filtering and other picture related parameters.
Screenshots below illustrate parameters stored in SPS and PPS.
Slice
Slice NAL Unit contains data from one coded frame or field. Slice can contain full frame or just part of frame. One frame can be stored as a sequence of Slices.
There are following slice types:
- I-slice – slice with only intra prediction
- P-slice – slice with inter prediction from one I or P slices
- B-slice – slice with inter prediction from two I or P slices
I-slice exploits spatial redundancy only. P-slice and B-slice can use intra prediction as well as inter prediction (exploits spatial redundancy between frames).
There is one special slice type called IDR-slice (instantaneous decoding refresh). IDR-slice is used for signalling that there are no references from slices after it to slice before it. Decoding can start from IDR-slice without access to any previous slices.
SEI Messages
SEI Messages can contain different types of metadata and don’t affect the core decoding process. Also they give a way for storing any custom user metadata.
Сonclusion
There are some other NAL Unit types besides described here SPS, PPS, SLices and SEI. All of AVC syntax elements can be examined with Virinext Bitstream Analyzer. It is a GUI tool for both in-depth and high-level analysis for many encoding standards including AVC (H264) video compression.
[…] structure of AVC/H.264 video is described in detail in a post “The structure of AVC (H264) video”. This article focuses on explaining how video frames are stored within an AVC/H.264 bitstream. […]