Video Distortion Method for VMAF Quality Values Increasing

Video quality measurement takes an important role in many applications. Full-reference quality metrics which are usually used in video codecs comparisons are expected to reflect any changes in videos. In this article, we consider different colour corrections of compressed videos which increase the values of full-reference metric VMAF and almost don't decrease other widely-used metric SSIM. The proposed video contrast enhancement approach shows the metric inapplicability for video codecs comparisons, as it may be used for cheating in the comparisons via tuning to improve this metric values.



There are no comments yet.


page 3


Hacking VMAF with Video Color and Contrast Distortion

Video quality measurement takes an important role in many applications. ...

Hacking VMAF and VMAF NEG: vulnerability to different preprocessing methods

Video-quality measurement plays a critical role in the development of vi...

Using Metrics Suites to Improve the Measurement of Privacy in Graphs

Social graphs are widely used in research (e.g., epidemiology) and busin...

GLEU Without Tuning

The GLEU metric was proposed for evaluating grammatical error correction...

Implementing BOLA-BASIC on Puffer: Lessons for the use of SSIM in ABR logic

One ABR algorithm implemented on Puffer is BOLA-BASIC, the simplest vari...

Quality Assessment of Free-viewpoint Videos by Quantifying the Elastic Changes of Multi-Scale Motion Trajectories

Virtual viewpoints synthesis is an essential process for many immersive ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

At the moment, video content occupies a significant share of network traffic and is expected to grow to 71% in 2021

[1]. Therefore, the quality of the encoded video is becoming increasingly important, in particular, there is a growing interest in methods for assessing video quality. As new video codec standards appear, the existing standards are being improved. In order to choose one or another video encoding solution, it is necessary to have appropriate tools for video quality assessment. Since the only adequate method of evaluation is subjective evaluation (MOS), but it is extremely costly in terms of time and cost of its implementation, all other “objective” methods are improved in an attempt to approach the ground truth-solution (subjective evaluation).

Methods of evaluating the quality of the video can be divided into 3 categories [9]: full-reference, reduced-reference and no-reference. Full-reference metrics are the most common, as their results are easily interpreted — usually as an assessment of the degree of distortion in the video and their visibility to the observer. The only drawback of this approach compared to the others is the need to have the original video for comparison with the encoded, and this possibility is not always available.

One of the widely-used full-reference metrics that is gaining popularity in use for video quality assessment is Video Multimethod Fusion Approach [5] (VMAF), announced by Netflix. It is an open-source learning-based solution. Its main idea is to combine multiple elementary video quality features and to train it with SVM on subjective data to perform the final per-frame score. The scheme of this metric is shown [7] in Fig. 1.

Figure 1: The scheme of VMAF algorithm.

Despite increasing attention to this metric, many video quality analysis projects, such as MSU Annual Video Codec Comparison [2], use SSIM metrics and some even PSNR. At the same time, many users of these comparisons send requests to use metrics of VMAF type. The main obstacle to the full transition to the use of VMAF metrics is not the versatility of this metric is not fully adequate results on some types of video [4].

The main objective of this article is to find such video transformations that will improve VMAF-score without reducing the SSIM metric that is, those types of distortions that change the visual quality of the video (which should lead to a decrease in the value of any full-reference metric) and lead to an increase in the value of VMAF, which is a significant obstacle to the use of VMAF for all types of video and leads to the need to modify the original VMAF algorithm. The basis for this type of transformation will be colour and contrast adjustments.

2 Study Method

Two approaches for colour adjustments were tested to find the best strategy for VMAF scores increasing. In the first case, the distortions were applied for the videos before encoding. In the second case, the colours were adjusted after encoding. In general, there was no significant difference between these options, because the compression step can be omitted for increasing VMAF with colour enhancement. Therefore, further we will describe only the first case with adjustment before compression, and we leave the compression step because in our work VMAF tuning is considered in case of video-codec comparisons.

We chose 4 different videos with FullHD resolution and high bitrate to find colour transformations which may influence VMAF scores. Three of them (Crowd run, Red kayak and Speed bag) were taken from open video collection on and one was taken from MSU video collection used for selecting testing video sets for annual video codecs comparison [2]. All videos have different spatial and temporal complexity [8] and content. The description (and sources) of the first three videos can be found on site [6], and the rest Bay timelapse video sequence contained a scene with water and grass and the grass and waves on the water. It was filmed in flat colours and required post-processing.

Three versions of VMAF were tested: 0.6.1, 0.6.2, 0.6.3. The implementations of all three metric versions from MSU Video Quality Measurement Tool [3] were used. The results didn’t much different, so the following plots are presented for the latest (0.6.3) VMAF version.

3 Proposed Tuning Algorithm

For colour and brightness adjustment, two image processing algorithms were chosen: unsharp mask and histogram equalization. The implementations of these algorithms from the scikit-image library were used. In this library, unsharp mask has two parameters which influence image levels: radius (the radius of Gaussian blur) and amount (how much contrast was added at the edges). For histogram equalization, a parameter of clipping limit was analysed. In order to find optimal configurations of equalization parameters, a multi-objective optimization algorithm NSGA-II [10]

was used. Only the limits for the parameters were set to the genetic algorithm, and it was applied to find the best parameters for each testing video.

SSIM and VMAF scores were calculated for each video processed with the considered colour enhancement algorithms with different parameters. As it was mentioned before, after colour correction the videos were compressed with medium preset of x264 encoder on 3 Mbps. Then, the difference between metric scores of processed videos and original video were calculated to compare, how colour corrections influenced quality scores. Fig. 3 shows this difference for SSIM metric of Bay timelapse video sequence for different parameter values of unsharp mask algorithm. The similarity scores for VMAF quality metric are presented on Fig. 3.

Figure 2: SSIM scores for different parameters of unsharp mask on Bay timelapse video sequence.
Figure 3: VMAF scores for different parameters of unsharp mask on Bay timelapse video sequence.

On these plots, higher values mean that the objective quality of the colour-adjusted video was better according to the metric. VMAF shows better scores for high radius and a medium amount of unsharp mask, and SSIM becomes worse for high radius and high amount. The optimal values of the algorithm parameters can be estimated on the difference in these plots. For another colour adjustment algorithm (histogram equalization), one parameter was optimized and the results are presented on Fig. 

4 together with the results of unsharp mask.

Figure 4: Comparison of VMAF and SSIM scores for different configurations of unsharp mask and histogram equalization on Bay timelapse video sequence. The results in the second quadrant, where SSIM values weren’t changed and VMAF values increased, are interesting for us.

According to these results, for some configurations of histogram equalization VMAF become significantly better (from 68 to 74) and SSIM doesn’t change a lot (decrease from 0.88 to 0.86). The results slightly differ for other videos. On Crowd run video sequence, VMAF was not increased by unsharp mask (Fig. 4(a)) and was increased a little by histogram equalization. For Red kayak and Speed bag videos, unsharp mask could significantly increase VMAF and just slightly decrease SSIM (Fig. 4(b) and Fig. 4(c))

(a) Colour tuning results for Crowd run video sequence.
(b) Colour tuning results for Red kayak video sequence.
(c) Colour tuning results for Speed bag video sequence.
Figure 5: Comparison of VMAF and SSIM scores for different configurations of unsharp mask and histogram equalization on tested video sequences. The results in the second quadrant, where SSIM values weren’t changed and VMAF values increased, are interesting for us.

4 Results

The following examples of frames from the testing videos demonstrate colour corrections which increased VMAF and almost did not influence the values of SSIM. Unsharp mask with and Fig. 5(b) increased VMAF without significant decrease of SSIM for Bay timelapse.

(a) Without colour correction
(b) After unsharp
Figure 6: Frame 5 from Bay timelapse video sequence with and without colour correction.

For Crowd run sequence, histogram equalization with and also increased VMAF. The video is more contrasted, and the decrease in SSIM was more significant.

(a) Without colour
(b) After histogram equalization
Figure 7: Frame 1 from Crowd run video sequence with and without colour correction.

Red kayak looked better according to VMAF after unsharp mask with , .

For Speed bag the following parameters of unsharp mask allowed to increase VMAF greatly without influencing SSIM: , .

5 Conclusion

Video quality reference metrics are used to show the difference between original and distorted streams and are expected to take worse values when any transformations were applied to the original video. However, sometimes it is possible to deceive objective metrics. In our article, we described the way to increase the values of popular full-reference metric VMAF. If the video is not contrasted, VMAF can be increased by colour adjustments without influencing SSIM. In another case, contrasted video can also be tuned for VMAF but with little SSIM worsening.

Although VMAF has become popular and important, particularly for video codec developers and customers, there are still a number of issues in its application. This is why SSIM is used in many competitions, as well as in MSU Video-Codec Comparisons, as a main objective quality metric.

We wanted to pay attention to this problem and hope to see the progress in this are, which is likely to happen since the metric is being actively developed. Our further research will involve a subjective comparison of the proposed colour adjustments to the original videos and the development of novel approaches for metric tuning.

6 Acknowledgments

This work was partially supported by the Russian Foundation for Basic Research under Grant 19-01-00785a.


  • [1] Cisco Visual Networking Index: Forecast and Methodology. 2016-2021.
  • [2] HEVC Video Codec Comparison 2018 (Thirteen MSU Video Codec Comparison)
  • [3] MSU Quality Measurement Tool: Download Page
  • [4] Perceptual Video Quality Metrics: Are they Ready for the Real World? Available online:
  • [5] VMAF: Perceptual video quality assessment based on multi-method fusion, Netflix, Inc., 2017
  • [6] Video Test Media [derf’s collection]
  • [7] C. G. Bampis, Z. Li, and A. C. Bovik, “Spatiotemporal feature integration and model fusion for full reference video quality assessment,” in IEEE Transactions on Circuits and Systems for Video Technology, 2018.
  • [8] C. Chen, S. Inguva, A. Rankin, and A. Kokaram, “A subjective study for the design of multi-resolution ABR video streams with the VP9 codec,” in Electronic Imaging, 2016(2), pp. 1-5.
  • [9] S. Chikkerur, V. Sundaram, M. Reisslein, and L. J. Karam, “Objective video quality assessment methods: A classification, review, and performance comparison,” in IEEE Transactions on Broadcasting, 57(2), pp. 165–182, 2011.
  • [10] K. Deb, A. Pratap, S. Agarwal, and T. A. M. T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” in

    IEEE transactions on evolutionary computation

    , 6(2), pp.182-197, 2002.