H.264 - Advanced video coding - Complete overview
h.264

H.264 - Advanced video coding - Complete overview

8 min read

H.264, also known as Advanced Video Coding or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding.

Karan Singh
Karan Singh

16 May 2022

What is H.264

H.264, Also known as MPEG-4 Part 10 or Advanced Video Coding (AVC), H.264 converts digital videos into formats that take up very less storage space. This makes it easier to stream, play, and transmit videos over the internet. H.264 defines various profiles (tools) and levels (maximum resolutions and bitrates) - we will check this out in more depth later in the article. H.264 compression supports digital videos up to 8K Ultra HD.  

The H.264 codec was jointly developed by the MPEG (Moving Picture Experts Group) and ITU (International Telecommunication Unit). Many famous vendors use H.264 in their own codec versions - like Apple Codec, MainConcept Codec, x264 Codec, and so on.

How does the h.264 codec work?

Video codecs based on the H.264 standard, compress digital video streams and make them fit only half the bandwidth or storage space of the MPEG-2 (H.262) standard. Using H.264 compression, the codec can retain the video quality as it is, without compromising at all while reducing the required space to only half of the original space.

The H.264 video encoder carries out three important processes - prediction, transform, and encoding - to give a compressed H.264 bitstream. The decoder then carries out complementary processes - decoding, inverse transform, and reconstruction - to produce the decoded video stream.

Let’s look at the H.264 encoder processes in detail:

  • Prediction: The encoder processes a frame of video units of a Macroblock (16x16 displayed pixels). That forms a prediction of the macroblock based on the previous-coded data - either from the current frame (intra prediction) or from previous coded and transmitted frames (inter prediction). The encoder extracts the prediction and forms a residual.
  • Transformation and quantization: A block of residual samples is transformed using an approximate form of Discrete Cosine Transform, or 8x8 or 4x4 integer transform. This outputs a set of coefficients, each of which relates to the weighting value of standard basis patterns. These basis patterns can be combined to re-create the initial block of residual samples. This output is quantized, i.e., each coefficient is divided by an integer value. The purpose of this is to result in a block where most or all of the coefficients are zero, with very few non-zero coefficients.
  • Bitstream encoding: All the values produced previously need to be encoded now. These values include - quantized coefficients, information for the decoder to recreate the prediction, information about compressed data structure and compression tools used, and information about the overall video sequence. These values and parameters are encoded using arithmetic coding or variable length coding to produce compact binary information of the initial information. This bitstream is then stored or transmitted, as per the requirements.

On the other side, the decoder works by complementing the steps of the encoder, in the following way:

  • Bitstream decoding: A decoder receives the compressed bitstream and decodes each of the syntactical elements. Then, it extracts the information described above. This is how the reverse of the coding process takes place and a sequence of video images is recreated.
  • Rescaling and inverse transform: The quantized coefficients are re-scaled by multiplying by an integer value to restore the initial scale. This process helps in the recreation of each block of the residual data, which is then combined to form the macroblock.
  • Reconstruction: For each macroblock, the decoder forms the most accurate prediction of the one initially created by the encoder. The decoder adds this prediction to the decoded residual. This leads to the reconstruction of a decoded macroblock which is then displayed as a fragment of the video frame.
Advanced Video Coding
H.264

H.264 / AVC Overview

H.264 is often referred to as AVC - short for Advanced Video Coding, and it is also known as MPEG-4 Part 10. The H.264 encoding standard was created by ISO/IEC and ITU-T - two standard international bodies. It was first published in 2003 as a part of a document titled “Recommendation H.264: Advanced Video Coding”.

Here are some features and other essential overviews of H.264.

Features

H.264 compression reduces the size of the video container to about half of the original. In doing so, the H.264 based codecs don’t compromise on any quality. In terms of the features that make it capable of performing this feat, here are some:

  • Slice structure coding: Slice can be understood as an array of macroblocks within one specific slice group. They provide distinct resynchronization points within video data and ensure that no intra-frame predictions take place on slice boundaries. This feature makes it possible for H.264 compression to reduce losses such as packet loss probability and visual degradation to the bare minimum.
  • Flexible Macroblock ordering (FMO): This is a strategy for rescheduling the order of representation of macroblocks. This comes in extremely handy for error robustness which has long-term positive impacts during video transmission.
  • Data partitioning: This is another key feature of H.264, and it allows the separation of the header, motion information, and intra-information by distributing all the syntactic elements to network abstraction layer units.
  • Intra-coding: Using intra-coding constrains the effect of packet loss for motion compensation. It also helps in terminating and reducing error propagation to minimum.
  • Switching pictures: This feature of H.264 allows predictive coding in situations where there is a difference in the reference signal. This feature can be used for adaptive error-resilient purposes, especially in wireless environments.

Let’s look at what profiles and levels are in the context of H.264 compression.

Profile and levels

Levels and Profiles are two important parameters to consider when talking about H.264 compression.

In terms of profiles, H.264 supports the following encoding profiles:

  • Baseline
  • Main
  • High10
  • Extended.

These profiles are the various subsets of the entire encoding techniques available with H.264.

The baseline profile is simpler to design as it supports only a limited subset of all techniques. That is why it requires fewer lines of code and less processing power. Because of these reasons, the baseline profile is aimed at low-power handheld devices. The Main profile, on the other hand, supports most of the encoding techniques. That is why it requires more code, memory, and greater processing power to get the decoding done properly. The main profile is supported by a set-top-box and such. The High profile supports pretty much all of H264’s encoding techniques, and is designed for HD and FullHD, and is supported by software decoders as well.

When it comes to deciding between these profiles and finding out the best ones to use, the answer completely depends on your final requirements. If you are targeting only the Flash Player, you can just use the High profile, but if you target iPhones, for example, you will definitely need to encode the video using the Baseline profile.

While profiles help take care of code complexity and processing powers, levels are all about taking care of bandwidth, maximum resolution, and memory problems on the decoder side. Specific devices support maximum profile depending on the max resolution and memory available with them. Lower levels refer to lower resolutions, fewer bit rates, and lesser memory to store frames. Essentially, levels in H.264 help in specifying the maximum video resolution and data rate that a device can support. For instance, iPad 2’s specifications indicate that the device can play Main Profile encoded videos at level 3.1. Likewise, different devices have different levels that they operate on.

Applications of h.264

The H.264 compression has proven perfect for A/V distribution from one source to multiple destinations. H.264 is extremely useful in all use cases that require the long-distance transmission of signals using wires. Since this is extremely fast and does not compromise on the video quality, it is fast becoming the go-to video compression format for the digital world. Other than this, the applications of H.264 are really wide-ranging from outside broadcast vans, education, transportation drone, environmental monitoring, and so much more.

The H.264 encoder selects from a variety of compression tools, which makes it perfect for use cases ranging from low-delay, low-bitrate mobile transmissions to HD or FullHD consumer TV, to even professional television productions. The H.264 standard provides integrated support for transmission or storage, which includes quantized compressed formats and features that minimize the effect of transmission errors. Some of the applications where H.264 compression has proven it's worth include:

  • High-definition DVD formats
  • High-definition TV broadcasting
  • Mobile TV broadcasting
  • Internet video surfing and streaming
  • Video conferencing, and more.

Why h.264 compression?

H.264 video compression intends to provide the best quality video at bit rates much lower than the other video compression formats. It does all of it without increasing the complexity or reducing the robustness of the bitstream. This also makes H.264 a flexible format as it can be applied to a wide range of use cases and solve several problems.

Comparison with other video codecs

There are various other compression standards available, but the most common comparison of H.264 is with H.265, MPEG2, VP9, and AV1. Let’s briefly go over what these different codecs are and how to find the best out of these for your cause.

  • H.265/HEVC: H.265 or HEVC (High-Efficiency Video Coding): This is a successor of AVC and delivers up to 20-40% better compression efficiency with the improved or the same video quality. Like AVC, it supports 8K Ultra High Definition resolution but delivers comparatively smaller files which makes it more efficient for streaming or long-term transmission. HEVC is designed with advanced video coding layers, parallel processing tools, and other important extensions.
  • AV1: Developed by AOM (Alliance for Open Media), AV1 is truly a next-generation video coding format. This codec improves HEVC’s encoding and decoding capabilities by 30% and uses low computational powers and quick hardware optimizations. This allows it to deliver the highest-quality real-time videos that can be scaled to any device. This codec uses much more advanced algorithms and is intended for use in WebRTC and HTML5 Web Video together with Opus audio codec format.
  • VP9: This is a royalty-free alternative to H.265 and is developed by Google. Every video platform linked to Google in any way - from Chrome browser, Android phones, to YouTube and more - supports VP9 codec. This offers better video quality at the same bitrate as H.264 which makes it effective for streaming and transmitting 4K HD videos online.

In terms of how H.264 compares with H.265, AV1, and VP9, you should keep in mind that H.264 is an older codec. With rapid advancements in technology, codecs have evolved over the years to tackle much more complicated and specific challenges. However, there are still ample use cases for each of the codec - both old and new - depending on the device and bandwidth being used.

Benefits of h.264

To summarize, the benefits of H.264 include:

  • Seamless support for resolution more than and including 8K Ultra High-Definition.
  • Extremely high-quality videos even after compressing them to half the original size.
  • H.264 keeps coming up with frequent updates to keep the compression algorithms updated and ready for the next challenge. Currently, it is on its 26th version - which was released in June 2019. This version saw changes to the content color volume, sphere rotation, content light level information, etc.
  • H.264 uses significantly less storage, which is essential for easy video transmission through IP.

Conclusion

Codec selection is one challenge that every media producer faces. In such a situation, only information can be your savior. If you are well informed about the various compression formats and the use cases that each works well on, you will make the right decision. H.264 codec is one such compression and decompression format that is widely used, and that has completely redefined the way videos are shared and transmitted digitally.


Related Readings

Difference between h.264 and h.265

Difference between h.264 and h.265

H.264 and H.265 are the two most commonly used codecs in the world of video streaming. Here's a detailed guide exploring the differences between H.264 and H.265.
Naveed Altaf A
Naveed Altaf A

10 Jun 2022 - 5 min read