MPEG-4 AVC/H.264 Video Codecs Comparison 2010 - Appendixes
MSU Graphics & Media Lab (Video Group)
Appendix 1. Subjective Comparison and Psycho-visual Enhancements
Brief Description
In this work, we analyzed video codecs not only with help of objective metrics, but also using subjective comparisons using SAMVIQ (Subjective Assessment Method for Video Quality evaluation) methodology. This new method was created by the EBU (European Broadcasting Union).
Another task in this part of comparison was to analyze the psycho-visual enhancement quality during encoding. We tested five video codecs, and one codec was tested with two almost identical presets: one without psycho-visual enhancement and another with psycho-visual enhancement.
SAMVIQ Description
During testing, each expert is able to play any sequence from the test set and give it a mark, and he is able to play a reference video. Marks are in the range of 0 to 100. More information about the methodology can be found in the publication SAMVIQ: A new EBU methodology for video quality evaluations in multimedia(Kozamernik, F., Steinmann, V., Sunna, P. and Wyckens, E., SMPTE journal, 2005 04 April).Subjective Assessment Description
Forty-two experts participated in this subjective assessment. The experts represent a variety of individuals: males and females of ages 18 to 40 who are PC users or video and graphics experts. Each user watched five video groups, with eight videos in each group. Each video group contains six encoded videos and one reference video (the viewers were unaware of which type they were viewing), as well as a reference video specified as such. Users gave each video a mark of 0 to 100, and the marks for all users and sequences were averaged. Some extreme results were discarded before obtaining the final result.Video Sequences
Five different video sequences were used in this comparison."Battle"
This sequence is a fragment from the beginning of the movie Terminator 2. The compression of this sequence is the most difficult among all of the sequences in the analysis. This difficulty is mainly due to three reasons: continual brightness variation (resulting from explosions and laser flashes as seen in the picture above), very fast motion and frequent scene changes. These characteristics often cause codecs to compress frames as I-frames."Football"
This sequence is a part of sports telecast. The sequence has a high contrast level, strong motion and rich colors."Italy"
This sequence has many scenes with fading transitions between them. This fading can cause encoding difficulties. The sequence contains many sharp details."Quadbike"
This sequence has slow motion with an almost static camera and few scene changes."Simpsons"
This sequence is a part of the Simpsons animated movie; it has high contrast and different types of motion.Video Codecs
Five codecs were used in the comparison:Results
Full results could be found in professional version of this report: H.264 Comparison Report Pro 2010 - Enterprise editionThis version contains only one-sequence results
"Football" Sequence
For the "Football" sequence, x264 provides the highest quality. x264 with psycho-visual enhancement provides very similar (almost undistinguishable) results. The codecs can be rated by visual quality as follows:- x264 with psycho-visual enhancement
- x264
- MainConcept
- DivX H.264
- Elecard
- DivX ASP
H.264 Comparison Report Pro 2010 - Enterprise edition
Appendix 2. x264 Over-years Comparison
The quality of an H.264 codec, over several years, can be compared for a given video sequence. The x264 encoder was chosen for this task because it is present in almost every MSU VIDEO MPEG-4 AVC/H.264 codec comparison, and it produces good results compared with other encoders. Next figures show the position of the x264 codec compared with other codecs for the “Battle” sequence. For all years except 2005, x264 shows the best results. For years 2006–2009, we have shown results using Y-SSIM as the quality metric; for 2005, we did not use this as the main metric. In light of these results, x264 could be a good reference encoder for analyzing the overall progress of H.264 encoders over time. Next figure shows the RD curve for the “Battle” sequence using x264 encoders from different years. The best encoder is this year’s x264; the worst is the 2005 version.
Using SSIM, the codecs can be ranked as follows:
- x264 (2010)
- x264 (2009)
- x264 (2007)
- x264 (2006)
- x264 (2005)
Interestingly, if Y-PSNR is used as the quality metric, x264 does not exhibit the kind of progress shown in previous figure. This difference is because for past years, the x264 developers specified, for use in testing, the presets optimized for the SSIM metric. Also, note that no encoder (among the different x264 versions) produces the best results simultaneously for both SSIM and PSNR.
The bitrate handling mechanism for the x264 encoder is quite good for each version, as following figure indicates. Results for previous x264 versions (0.98 of target bitrate) could be explained by a different interpretation of kbps (1,024 versus 1,000 bits per second).
The per-frame analysis presented in following figure shows that the main encoding mechanism did not changed significantly.
Appendix 3. Theora vs. x264 Single-thread Comparison
Multithreaded encoding is not supported by the Theora encoder, but CPUs with four cores are used for the tests. This situation leads to an incorrect speed comparison between Theora and other codecs. To eliminate this disparity, we have turned on the x264 encoder’s single-thread mode and compared the results with those of Theora. The overall speed/quality trade-off for “Movies” and “HDTV” use cases is depicted in next figures.
The main conclusion is that even in single-thread mode, Theora is a little bit slower and produces much poorer quality than does x264.
This sequence is a fragment from the Ice Age 3 animated movie. This movie has low-contrast portions and high-contrast portions, and it has many types of motion: camera panning, slow motion and very fast motion. Also, it has a scene with colors that differ completely from those of other scenes. Small black letterboxes appear at the top and bottom of the video.
This sequence is a fragment from the Indiana Jones movie. Compression of this sequence is difficult for two main reasons: the presence of low-contrast scenes and the high level of motion in different scenes. Also, several scenes have very different types of motion, ranging from almost static scenes with talking people to scenes with strong motion (for example, the scene where stones fall).
This sequence is a fragment from the Enemy of the State movie. This sequence includes outdoor scenes with strong motion at the beginning when the bicyclist runs, as well as scenes with low motion and indoor scenes with normal motion. This sequence has scenes with different lighting conditions.
This sequence is a fragment from the Up animated movie. The sequence contains low-contrast scenes with almost static brightness and high-colored scenes, and it contains many scenes with a few frames that include quick scene changes.
This sequence contains scenes with the camera panning over a landscape view of the Amazon; also, some scene changes take place with the camera panning again. The video frames consist of many edges and high-contrast, sharp details, such as leaves and branches. This sequence was downloaded from the Microsoft website:
http://www.microsoft.com/windows/windowsmedia/musicandvideo/hdvideo/contentshowcase.aspx
This sequence is a part of the Iron Man 2 movie trailer. It has low-brightness scenes at the beginning, followed by a scene with very fast motion and scenes with slow camera panning. Some frames are very blurry. Also, the sequence contains scenes with flashing lights, which could be a big problem for some encoders. Black letterboxes appear at the top and bottom of the video.
This close-up sequence is similar to “Mobile&Calendar” and includes a moving calendar with text and a detailed photo of the Vasa ship. It also includes a moving train with colorful toys. The background has two types of wallpaper: one is brown with details, and the other is yellow with drawn figures. The sequence is very detailed and is normally demanding. The main potential compression difficulty is the many small, sharp details on the calendar and on the background.
This sequence is a fragment of the “Troy” movie and contains three parts with sharp scene changes. The video includes medium scene motion and slow camera motion. In terms of compression, this sequence is difficult to compress because of the many small details.
Relative bitrate/relative time charts show the dependence on relative encoding time of the average bitrate for a fixed quality output.
The Y-axis shows the ratio of the bitrate of the codec under test to that of the reference codec for a fixed quality.
A lower value (that is, the higher the value is on the graph) indicates a better-performing codec.
For example, a value of 0.7 means that codec under test can encode the sequence under test in a file that is 30% smaller than that
encoded by the reference codec.
The X-axis shows the relative encoding time for the codec under test. Larger values indicate a slower codec.
For example, a value of 2.5 means that the codec under test works 2.5 times slower, on average, than the reference codec.
As a result of these advantages, relative bitrate/relative time graphs are used frequently in this report since they assist in the evaluation of the codecs in the test set, especially when number of codecs is large.
A more detailed description of the preparation of these graphs is given below.
The first step in computing the average bitrate ratio for a fixed quality is inversion of the axes of the bitrate/quality graph
(see next figure). All further computations are performed using the inverted graph.
The second step involves averaging the interval over which the quality axis is chosen.
Averaging is performed only over those segments for which there are results for both codecs.
This limitation is due to the difficulty of developing extrapolation methods for classic RD curves; nevertheless,
for interpolation of RD curves, even linear methods are acceptable.
The final step is calculation of the area under the curves in the chosen interpolation segment and determination of
their ratio (see next figure). This result is an average bitrate ratio for a fixed quality for the two codecs.
If more than two codecs are considered, then one of them is defined as a reference codec and the quality of others is compared
to that of the reference.
To compute the relative processing time of two codecs for a particular video sequence,
the encoding time is calculated for both codecs (the encoding times are summed for all bitrates)
and the ratio is taken. For three or more codecs, one codec is chosen as a reference and the ratio of
its encoding time to that of the others is calculated.
For multiple sequences, each codec is assigned an arithmetic mean of average relative encoding times for each sequence.
Appendix 4. Test Set of Video Sequences
Movie Sequences
"Ice Age"
"Indiana Jones"
"State Enemy"
"Up"
HDTV Sequences
"Amazon"
"Iron Man"
"Mobile Calendar"
"Troy"
Appendix 5. Tested Codecs and Presets
Codecs
DivX AVC/H.264 Video Encoder
Elecard AVC Video Encoder 8-bit edition
Intel® MediaSDK AVC/H.264
MainConcept AVC/H.264 Video Encoder Console Application
Microsoft Expression Encoder 3
Theora encoder
x264
Presets
The table below lists the settings used in this comparison for all of the codecs.
Appendix 6. Figures Explanation
The main charts in this comparison are classical RD curves (quality/bitrate graphs) and relative bitrate/relative time charts.
Additionally, bitrate handling charts (ratio of real and target bitrates) and per-frame quality charts were also used.
RD curves
These charts show variation in codec quality by bitrate or file size. For this metric, a higher curve presumably indicates better quality
Relative Bitrate/Relative Time Charts
Graph Example
The following figures shows a case where these graphs can be useful.
In the top left graph, it is apparent that the “Green” codec encodes with significantly
better quality than the “Black” codec. On the other hand, the top right graph shows that the “Green” codec is slightly slower.
Relative bitrate/relative time graphs can be useful in precisely these situations: it is clearly visible in the bottom graph
that one of the codecs is slower, but yields higher visual quality, and that the other codec is faster, but yields lower visual quality.
Bitrates Ratio with the Same Quality
Relative Codec Encoding Time Computation
Appendix 7. Objective Quality Metrics Description
SSIM (Structural SIMilarity)
Brief Description
was published by Wang, et al (Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh and Eero P.
Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,”
IEEE Transactions on Image Processing, Vol. 13, No. 4, April 2004).
The SSIM author homepage is found at the following URL: http://www.cns.nyu.edu/~lcv/ssim
The scheme of SSIM calculation can be presented as follows. The main idea that underlies the structural similarity (SSIM) index is comparison of the distortion of three image components:
The final formula, after combining these comparisons, is the following:L is the dynamic range of the pixel values (255 for 8-bit grayscale images), and K1, K2 << 1.
The values K1 = 0.01 and K2 = 0.03 were used for the comparison presented in this report, and the matrix filled with a value “1” in each position to form a filter for the result map.
For the implementation used in this comparison, one SSIM value corresponds to two sequences. The value is in the range [-1, 1], with higher values being more desirable (a value of 1 corresponds to identical frames). One of the advantages of the SSIM metric is that it better represents human visual perception than does PSNR. SSIM is more complex, however, and takes more time to calculate.
Examples
The following is an example of an SSIM result for an original and processed (compressed with lossy compression) image. The resulting value of 0.9 demonstrates that the two images are very similar.
The following are more examples how various types of distortion influence the SSIM value.
The SSIM values for the Y-plane for these images are given below.
PSNR (Peak Signal-to-Noise Ratio)
Brief Description
This metric, which is often used in actual practice, is called the peak signal-to-noise ratio, or PSNR.Generally, this metric has the same form as the mean square error (MSE), but it is more convenient to use because of the logarithmic scale. It still has the same disadvantages as the MSE metric, however.
In MSU Video Quality Measurement Tool the PSNR can be calculated for all YUV and RGB components and for the L component of LUV color space. The PSNR value is quick and easy to calculate, but it is sometimes inappropriate as relates to human visual perception.
A maximum deviation of 255 is used for the PSNR for the RGB and YUV color components because, in YUV files, there is 1 byte for each color component. The maximum possible difference, therefore, is 255. For the LUV color space, the maximum deviation is 100.
The values of the PSNR in the LUV color space are in the range [0, 100]; the value 100 means that the frames are identical.
Examples
PSNR visualization uses different colors for better visual representation:Appendix 8. VP8, x264 and XviD comparison
This appendix includes results for VP8 encoder comparison with x264 and XviD.
VP8 encoder does not met our speed requirements for this comparison, but we have included this codec due to many users requests. We would like to express our gratitude to VP8 developers for provding codec and presets and their help with presets tuning.
RD-curves
Movies
HDTV
Bitrate handling
Movies
Bitrate handling for VP8 encoder for movies is very good. Here are the results for Movies.HDTV
Bitrate handling for VP8 encoder for HDTV is quite good, except Troy sequence at low bitrates. Here are the results for HDTV.Speed/Quality trade-off
Movies
Comparing VP8 to XviD, VP8 is 5-25 times slower with 10-30% better quality (lower bitrate for the same quality). When comparing VP8 and x264 VP8 also shows 5-25 lower encoding speed with 20-30% lower quality at average. For example x264 High-Speed preset is faster and has higher quality than any of VP8 presets at average.HDTV
Comparing VP8 to XviD, VP8 is 5-20 times slower with 10-20% better quality (lower bitrate for the same quality). When comapring VP8 and x264 VP8 shows 5-20 lower encoding speed with almost the same quality, excluding x624 High-Quality preset. The results for HDTVPresets
Name | Encoding parameters |
---|---|
good1 |
First pass: -p 2 --pass=1 --fpf=tmp.fpf --threads=4 --good --cpu-used=1 --end-usage=0 --auto-alt-ref=1 -v --minsection-pct=5 --maxsection-pct=800 --lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=999999 --token-parts=2 --static-thresh=0 --min-q=0 --max-q=63
Second pass: |
good2 | The same as good1, but --cpu-used=2 |
good3 | --cpu-used=3 |
good4 | --cpu-used=5 |
good5 | --cpu-used=5 |
best | The same as good1, but --best instead of --good and without --cpu-used |
Comments from VP8 Developers
We've been following the MSU tests since they began and respect the group's work. One issue we noticed in the test is that most input sequences were previously compressed using other codecs. These sequences have an inherent bias against VP8 in recompression tests. As pointed out by other developers, H.264 and MPEG-like encoders have slight advantages in reproducing some of their own typical artifacts, which helps their objective measurement numbers but not necessarily visual quality. This is reflected by relatively better results for VP8 on the only uncompressed input sequence, "mobile calendar."
Even with this limitation, VP8 delivered respectable results against other encoders, especially considering this is the first time VP8 has been included in the test and VP8 has not been specifically optimized for SSIM as some other codecs have.
To date, WebM developers have focused on the VP8 decoder performance and are only starting to optimize the encoder for speed. The WebM project has only been underway for three weeks, and we believe that our encoder speed will improve significantly in the near future.
Contact Information
MSU video codecs comparisons resources:
- Introduction to Video Codecs Comparison
- Lossless Video Codecs Comparison 2004 (October 2004)
- MPEG-4 SP/ASP Video Codecs Comparison (March 2005)
- JPEG 2000 Image Codecs Comparison (September 2005)
- First Annual MPEG-4 AVC/ H.264 Video Codecs Comparison (January 2005)
- Second Annual MPEG-4 AVC/H.264 Video Codec Comparison (December 2005)
- Subjective Comparison of Modern Video Codecs (February 2006)
- MPEG-2 Video Decoders Comparison (May 2006)
- WMP and JPEG2000 Comparison (October 2006)
- Third Annual MPEG-4 AVC/H.264 Comparison (December 2006) (All versions for free!)
- Lossless Video Codecs Comparison 2007 (March 2007)
- Fourth Annual MPEG-4 AVC/H.264 Comparison (December 2007) (All versions for free!)
- Options Analysis of MPEG-4 AVC/H.264 Codec x264 (December 2008)
- Fifth MPEG-4 AVC/H.264 Comparison (May 2009) (All versions for free!)
- Sixth MPEG-4 AVC/H.264 Comparison (May 2010)
- Seventh MPEG-4 AVC/H.264 Comparison (May 2011)
- Eighth MPEG-4 AVC/H.264 Comparison (May 2012)
- Ninth MPEG-4 AVC/H.264 Comparison (Dec 2013)
- Tenth Video Codec Comparison (HEVC) (Oct 2015)
- Eleventh Video Codec Comparison (HEVC) (Aug 2016)
- Twelfth Video Codec Comparison (HEVC) (Aug 2017)
- Thirteen Video Codec Comparison (HEVC) (Aug 2018)
- Fourteen Video Codec Comparison (HEVC) (Sept 2019)
- Cloud Encoding Servoces Comparison 2019 (Dec 2019)
- Fifteen Video Codec Comparison (HEVC) (Dec 2020)
- Sixteen Video Codec Comparison (Dec 2021)
- Seventeen Video Codecs Comparisons (Nov 2022)
- Eighteen Video Codecs Comparisons
- Codec Analysis for Companies:
Other Materials
Video resources:
Server size: 8069 files, 1215Mb (Server statistics)
Project updated by
Server Team and
MSU Video Group
Project sponsored by YUVsoft Corp.
Project supported by MSU Graphics & Media Lab