MSU Subjective Comparison of Modern Video Codecs
MSU Graphics & Media Lab (Video Group)Return to Subjective Comparison of Modern Video Codecs home page!
Part 2. Results of subjective comparison
Contents
Results of the assessment
Analysis of the subjective results
Average subjective mark of video sequence is named MOS (Mean Opinion Score). This mark is obtained by simple averaging of subjective scores:
Where k - number of sequence for which MOS is calculated;
marki,k - mark given by i-th expert to k-th sequence;
experts_num - overall number of experts.
To illustrate different dispersion of individual marks for each MOS, left and right borders of 95% confidence intervals were counted.
To estimate probability that experts were able to distinguish two codecs on a given sequence, we calculate z-test for each pair of codecs and bitrates. We used following formula to estimate this probability:
Let and be the subjective scores for two sequences. Then
Where and - MOS for first and second sequences;
and - variations of subjective marks;
experts_num - total amount of experts.
And the probability is
Objective metrics
For all sequences PSNR, VQM and SSIM were measured with MSU Video Quality Measurement Tool [7].
PSNR is the most popular metric. Its sense is similar to the mean square error, but it is more convenient to use due to the logarithmic scale.
There are a lot of examples when PSNR does not reflect subjective quality.
VQM [3] and SSIM [4] are relatively new metrics that pretend to reflect subjective opinion.
To compare objective metrics' prediction, their results must be mapped on common scale. According to the procedure described in [1], results of each metric were mapped to the subjective data scale using the following fitting function:
Where
O - objective data;
Ofitted - fitted objective data;
g and d - parameters.
Parameters g and d were selected to minimize sum of squares of differences between Ofitted and subjective data:
Where S - subjective data.
Results of fitting process can be regarded as a prediction of a subjective opinion by an objective metric.
MOS+PSNR/bitrate graphs
On the following graphs one can see subjective data for each sequence, its' 95% confidence intervals and MOS values predicted by PSNR(3) (after fitting).
Battle
Picture 7. Battle
The "Battle" sequence is the most difficult one for codecs. PSNR is wrong in a number of points, for instance on x264 690 and XviD 1024 PSNR values contradict subjective scores. x264 is the absolute leader on all bitrates, followed by DivX, WMV and XviD.
Z-test table is shown below (probability that experts distinguished two sequences).
Battle |
Ref. |
DivX 1024 |
DivX 690 |
WMV 1024 |
WMV 690 |
x264 1024 |
x264 690 |
XviD 1024 |
XviD 690 |
Ref. |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
DivX 1024 |
1 |
1 |
1 |
1 |
1 |
0.87 |
1 |
1 |
1 |
DivX 690 |
1 |
1 |
1 |
0.97 |
0.94 |
1 |
0.95 |
0.89 |
1 |
WMV 1024 |
1 |
1 |
0.97 |
1 |
1 |
1 |
0.53 |
1 |
1 |
WMV 690 |
1 |
1 |
0.94 |
1 |
1 |
1 |
1 |
0.65 |
1 |
x264 1024 |
1 |
0.87 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
x264 690 |
1 |
1 |
0.95 |
0.53 |
1 |
1 |
1 |
1 |
1 |
XviD 1024 |
1 |
1 |
0.89 |
1 |
0.65 |
1 |
1 |
1 |
1 |
XviD 690 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
Rancho
Picture 8. Rancho
All codecs performed equally well on the "Rancho" sequence, difference between the subjective ratings is small. x264 1024 is still the best, with mark equal to that of uncompressed sequence.
Rancho |
Ref. |
DivX 1024 |
DivX 690 |
WMV 1024 |
WMV 690 |
x264 1024 |
x264 690 |
XviD 1024 |
XviD 690 |
Ref. |
1 |
0.94 |
1 |
0.96 |
1 |
0.51 |
0.91 |
0.83 |
1 |
DivX 1024 |
0.94 |
1 |
0.94 |
0.59 |
0.97 |
0.96 |
0.59 |
0.76 |
0.95 |
DivX 690 |
1 |
0.94 |
1 |
0.92 |
0.61 |
1 |
0.97 |
0.99 |
0.56 |
WMV 1024 |
0.96 |
0.59 |
0.92 |
1 |
0.96 |
0.98 |
0.68 |
0.83 |
0.93 |
WMV 690 |
1 |
0.97 |
0.61 |
0.96 |
1 |
1 |
0.98 |
1 |
0.54 |
x264 1024 |
0.51 |
0.96 |
1 |
0.98 |
1 |
1 |
0.94 |
0.87 |
1 |
x264 690 |
0.91 |
0.59 |
0.97 |
0.68 |
0.98 |
0.94 |
1 |
0.69 |
0.97 |
XviD 1024 |
0.83 |
0.76 |
0.99 |
0.83 |
1 |
0.87 |
0.69 |
1 |
0.99 |
XviD 690 |
1 |
0.95 |
0.56 |
0.93 |
0.54 |
1 |
0.97 |
0.99 |
1 |
Matrix sc.1
Picture 9. Matrix sc.1
XviD on 1024 kbps became a leader on this sequence, but its advantage is small. PSNR was adequate for this sequence except for x264 on 1024 kbps
Matrix sc.1 |
Ref. |
DivX 1024 |
DivX 690 |
WMV 1024 |
WMV 690 |
x264 1024 |
x264 690 |
XviD 1024 |
XviD 690 |
Ref. |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
DivX 1024 |
1 |
1 |
1 |
0.71 |
1 |
0.6 |
0.99 |
0.85 |
1 |
DivX 690 |
1 |
1 |
1 |
1 |
0.74 |
1 |
0.88 |
1 |
0.7 |
WMV 1024 |
1 |
0.71 |
1 |
1 |
1 |
0.79 |
0.97 |
0.95 |
1 |
WMV 690 |
1 |
1 |
0.74 |
1 |
1 |
1 |
0.71 |
1 |
0.88 |
x264 1024 |
1 |
0.6 |
1 |
0.79 |
1 |
1 |
0.99 |
0.78 |
1 |
x264 690 |
1 |
0.99 |
0.88 |
0.97 |
0.71 |
0.99 |
1 |
1 |
0.95 |
XviD 1024 |
1 |
0.85 |
1 |
0.95 |
1 |
0.78 |
1 |
1 |
1 |
XviD 690 |
1 |
1 |
0.7 |
1 |
0.88 |
1 |
0.95 |
1 |
1 |
Matrix sc.2
Picture 10. Matrix sc.2
x264 is the best again. PSNR values are close for DivX, WMV, x264 and XviD despite the fact that subjective scores differ.
Matrix sc.2 |
Ref. |
DivX 1024 |
DivX 690 |
WMV 1024 |
WMV 690 |
x264 1024 |
x264 690 |
XviD 1024 |
XviD 690 |
Ref. |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
DivX 1024 |
1 |
1 |
1 |
0.82 |
1 |
0.75 |
0.74 |
0.8 |
1 |
DivX 690 |
1 |
1 |
1 |
0.98 |
0.98 |
1 |
0.99 |
0.98 |
0.94 |
WMV 1024 |
1 |
0.82 |
0.98 |
1 |
1 |
0.94 |
0.6 |
0.52 |
1 |
WMV 690 |
1 |
1 |
0.98 |
1 |
1 |
1 |
1 |
1 |
0.69 |
x264 1024 |
1 |
0.75 |
1 |
0.94 |
1 |
1 |
0.9 |
0.93 |
1 |
x264 690 |
1 |
0.74 |
0.99 |
0.6 |
1 |
0.9 |
1 |
0.58 |
1 |
XviD 1024 |
1 |
0.8 |
0.98 |
0.52 |
1 |
0.93 |
0.58 |
1 |
1 |
XviD 690 |
1 |
1 |
0.94 |
1 |
0.69 |
1 |
1 |
1 |
1 |
MOS+PSNR graphs, grouped by bitrate
To ease evaluation of codecs on different bitrates separately, we provide same graphs as in the previous paragraph except MOS results are grouped by bitrate.
Picture 11. Battle
Picture 12. Rancho
Picture 13. Matrix sc.1
Picture 14. Matrix sc.2