What is vp9?

VP9 is an open-source video codec developed by Google after their acquisition of On2Technologies in Feb 2010. This was the second codec, after VP8, to be released, and it became available to the masses in June 2013.

In terms of usage, the largest distributor of VP9-codec content was YouTube. Later, in 2016, Netflix, too, announced the use of VP9 for their platform. Since then, many new use cases have emerged where VP9 has proven to be the viable solution.

vp8 vs vp9

VP9 is currently the default codec in WebRTC, and is better than VP8 when it comes to quality of compressed video. Although VP8 takes up less CPU usage (resources) to compress videos, the output is not as good as that of VP9. Migration from VP8 to VP9 is a good choice if you want to reduce the bitrate required while improving the quality of the final video received.

How does vp9 codec work?

The VP9 codec is similar to the H.265 process in terms of its working, and supports parallel processing. It reduces the bitrate to half of the original without compromising on the quality. This way, the VP9 codec works to support better streaming experience for low-end devices like tablets or smartphones. The VP9 codec works by compressing the initial raw video file using an algorithm into half the original size which makes it fit for seamless transmission over the internet. Here is a brief overview of how this goes on behind the scenes with VP9.

Picture partitioning

VP9 first divides the picture frame into superblocks - which are 64x64 sized blocks. The processing of these superblocks happens in a raster order: from left to right, and top to bottom. This processing is similar to that of other codecs. However, superblocks can be further divided into smaller components - as small as 4x4. This is made possible by using a quadtree that is similar to the one used in HEVC. However, unlike HEVC, there is no restriction in the subdivision, and it can be only horizontal or vertical as well.

VP9 also supports tiles - which is where the picture is divided into a grid of tiles along superblock boundaries. Unlike HEVC, these tiles are as evenly distributed as possible, and the total number of tiles possible is always a power-of-two. Tiles must be at least 256 pixels wide, and not more than 4096 pixels wide. Further, there can’t be more than four tile rows. The tiles are scanned in a raster order, and so are the superblocks within the tiles. This ensures that the ordering of superblocks depends on the tile structure.

At the end of all tiles, except the last one, a byte count is transmitted that indicates the number of bytes that will be needed to code the next tile. This allows a multithreaded decoder to skip ahead in order to start a decoding thread, and keeps things fast and optimized.

Bitstream coding

The bitstream generated is containerized either with WebM or IVF. IVF is simple, and WebM is just a subset of MKV. It is important to use a container otherwise it will be difficult to seek a particular frame without performing a full decode of preceding frames.

Like VP8, VP9, too, compresses the complete bitstream using an 8-bit arithmetic coding engine called the Bool-Coder. Each frame is coded into three buckets as follows:

  • Uncompressed header - which contains information like loop filter strength, picture size, and takes a dozen bytes or so.
  • Compressed header  - This transmits the probabilities used for the whole frame.
  • Compressed frame data - this contains the data required to reconstruct the frame. This includes block partition sizes, motion vectors, transform coefficients, and more.

Unlike VP8, VP9 does not have any data partitioning, and all the data types are used in super block coding order. This makes things easier for hardware designers.

Intra prediction

VP9’s intra prediction is similar to that of HEVC, and follows similar block partitions. As a result of this, intra prediction operations always result in a square. For instance, a 16x8 block with 8x8 transforms will result in two 8x8 prediction operations.

With VP9, there are 10 prediction modes. 8 of these prediction modes are directional. They use two 1D arrays containing the restructured upper and left pixels of neighbor blocks.

The manner of arrangement is such that the above array is twice as long as the current block’s width, and the left one is the same height. However, for intra prediction of blocks larger than 4x4, the horizontal array’s second half is extended beyond the first part’s last pixel.https://i.imgur.com/0n7jgj4.png

Inter prediction

For inter prediction, VP9 uses 1/8th pixel motion compensation, which offers twice the precision of other standards. With other standards, the motion compensation happens in a unidirectional manner. However, with VP9, this happens in a compound manner, which essentially refers to bi-prediction where there are two motion vectors for each block and the two resulting prediction samples are averaged together.

Segmentation:

Another feature offered by VP9, that makes it stand out from the rest, is segmentation. When this is enabled, the incoming bitstream is code a segment ID for each block - which is essentially an integer between 0 to 7. Each of these eight segments can have any of the following four features enabled:

  • Skip – blocks that have this feature active are assumed to not have any residual signal. These are important for static backgrounds.
  • Alternate quantizer – blocks having this feature may use a different quantization scale. This is useful for regions that require greater detail than the rest of the picture.
  • Ref – blocks having this feature are assumed to be pointing to a particular reference frame.
  • AltLf -  blocks with this feature enabled use a different smoothing strength. This is useful for when we want to smooth areas from our video or images that would otherwise be too rough and blocky.

All of these features and operations aid the working of VP9 codec and make it an efficient approach. Owing to its strengths, the VP9 codec is supported by Netflix video services, YouTube streams, and even many tech giants like Sony, Panasonic, Samsung, QUalcomm, NVIDIA, and so on.

Comparison with other video codec

VP9 vs H.264

The H.264 codec compresses large amounts of information from video files to enable them to be streamed online. The HD images that H.264 works with are 1280x720 pixels, which is 720p resolution, or 1920x1080 pixels, which is 1080p resolution. With 4K , on the other hand, the total number of pixels are 3840x2160. Such a drastic increase in the level of detailing demands a superior way to perform better compression in order to transmit, store, and use the data. In that context, VP9 is twice as effective as H.264, and uses only half the data to stream 4K content without compromising on quality.

VP9 vs H.265

When talking about VP9 vs H.265, it is important to note that there are many technical similarities between the two. Even the primary goal of both these codecs is the same - to compress video files into half the bitrate to stream HD video and provide better compression techniques for 4K video to become more approachable for people with regular internet bandwidths. That said, the biggest difference between these two is that VP9 is an open-source codec and can be used by anyone, whereas H.265 requires a license to be purchased before using. In terms of usage and efficiency, these two codecs are by and large comparable.

VP9 vs AV1

The primary difference between AV1 and VP9 is that while AV1 is worthwhile only for videos with views in the mid-to-high millions, VP9 is worth considering for even videos with view counts in excess of a few thousands. Further, since VP9 is free and enjoys widespread use, it is going to be a much more viable choice in the near future.

Conclusion

In conclusion, VP9 codec has proved to be extremely useful for streaming 4K videos seamlessly, even with limited bandwidth. Being open-sourced, VP9 allows anyone to get started with it and compress their 4K videos in a manner like never before!