Detecting video frames in AVC/H.264 compressed bitstreams

At a high level, video consists of a sequence of individual images or video frames. Uncompressed video requires a substantial amount of storage space. For instance, storing 1 second of video at a 1920×1080 resolution and 50 frames per second (50p) in the I420 format requires approximately 1920 * 1080 * 1.5 (bytes per pixel in I420) * 50 = 155.52 MB.

To reduce storage demands, video compression is used. Video encoding algorithms reduce the size of a video sequence by converting it from a series of independent frames into a compressed bitstream format, as specified by the appropriate video encoding standard.

One of the most widely used video compression standards is AVC/H.264, initially published in 2003. Even 21 years later, in 2024, AVC/H.264 remains one of the most popular video compression standards worldwide. At a high level, AVC video is organized as a sequence of NAL Units (Network Abstraction Layer Units). The first byte of each NAL Unit is a header byte that indicates the type of NAL unit.

The structure of AVC/H.264 encoded video displayed in Virinext Bitstream Analyzer — The structure of AVC/H.264 encoded video bitstream

The structure of AVC/H.264 video is described in detail in a post “The structure of AVC (H264) video”. This article focuses on explaining how video frames are stored within an AVC/H.264 bitstream. To analyze AVC video, we will use the Virinext Bitstream Analyzer, available as an evaluation version on the Download page. For licensing information, please refer to the Buy License page.

AVC/H.264 Slice NAL Unit

At a high level, AVC video is organized as a sequence of NAL Units (Network Abstraction Layer Units). Encoded video frames are stored within Slice NAL units, which can contain either a full frame or just part of a frame. This means that a single frame may be represented as a sequence of slices.

The following slice types are used:

I-slice: Contains only intra-predicted data (spatial prediction within the slice).
P-slice: Contains inter-predicted data referencing one prior slice.
B-slice: Contains inter-predicted data referencing up to two different frames.

I-slice only exploit spatial redundancy, while P-slice and B-slice use both intra-prediction and inter-prediction, addressing both spatial and temporal redundancies between frames.

A special slice type, called an IDR-slice (Instantaneous Decoding Refresh), signals a reset in frame referencing. After an IDR-slice, no slices reference any preceding slices, allowing decoding to start from the IDR-slice without requiring access to prior frames.

Screenshot of Virinext Bitstream Analyzer displaying different types of Slice NAL Units

Slice NAL units are stored in bitstream independently. As a result, errors in a bitstream within one slice – such as data loss or corruption – do not directly affect other slices. However, errors in decoding one slice can propagate to others and impact the different frame due to incorrect predictions based on corrupted data within affected slices.

Since slices are stored independently on bitstream level, they can also support multithreaded processing on both the encoding and decoding sides. This means that when a frame consists of multiple independent slices, they can be encoded and decoded in parallel by separate threads, improving processing efficiency.

Screenshot of Virinext Bitstream Analyzer wich displays fields in selected B-Slice

AVC/H.264 frame boundary detection in encoded bitstream

The Access Unit Delimiter (AUD) NAL Unit is a type of NAL unit used to indicate the start of a new video frame in an encoded bitstream. When present, an AUD signals that the following data belongs to a new frame. However, the AUD is an optional unit, and it may not appear in the bitstream at all. In such cases, frames are separated based on comparisons between fields in the previous and current slices according to the following rules.
If any of these conditions are met, the new slice belongs to a new frame; otherwise, both the last and new slices are part of the same frame.

1. lastSlice::pic_parameter_set_id != newSlice::pic_parameter_set_id
2. lastSlice::frame_num != newSlice::frame_num
3. lastSlice::field_pic_flag != newSlice::field_pic_flag
4. lastSlice::field_pic_flag == 1 && newSlice::field_pic_flag == 1 && lastSlice::bottom_field_flag != newSlice::bottom_field_flag
5. lastSlice::nal_ref_idc != newSlice::nal_ref_idc and (lastSlice::nal_ref_idc == 0 || newSlice::nal_ref_idc == 0)
6. (lastSlice::m_nalUnitType == AVC::NAL_SLICE_IDR) != (newSlice::m_nalUnitType == AVC::NAL_SLICE_IDR)
7. (lastSlice::m_nalUnitType == AVC::NAL_SLICE_IDR) && (newSlice::m_nalUnitType == AVC::NAL_SLICE_IDR) && (lastSlice::idr_pic_id != newSlice::idr_pic_id)
For cases when SPS::pic_order_cnt_type == 0:
8.1. lastSlice::pic_order_cnt_lsb != newSlice::pic_order_cnt_lsb
8.2. lastSlice::delta_pic_order_cnt_bottom != newSlice::delta_pic_order_cnt_bottom
For cases when SPS::pic_order_cnt_type == 1:
9.1. lastSlice::delta_pic_order_cnt[0] != newSlice::delta_pic_order_cnt[0]
9.2. lastSlice::delta_pic_order_cnt[1] != newSlice::delta_pic_order_cnt[1]

Conclusion

In this article, we explored the details of separating video frames within an AVC/H.264 compressed bitstream. We discussed high-level concepts of storing frames in Slice NAL Units, including how to determine when slices belong to different frames versus when they are part of the same frame.

To analyze AVC/H.264 encoded video, the Virinext Bitstream Analyzer can be used. This GUI tool offers both detailed and high-level analysis of encoded video and audio bitstreams. Supporting various encoding standards, including AVC/H.264, the Virinext Bitstream Analyzer is a valuable analyzing tools for researchers and developers. If you are interested in exploring further, we invite you to try the free evaluation version of the Virinext Bitstream Analyzer.

Download free evaluation version