Deviation Based Pooling Strategies For Full Reference Image Quality Assessment

04/26/2015 ∙ by Hossein Ziaei Nafchi, et al. ∙ Ecole De Technologie Superieure (Ets) McGill University 0

The state-of-the-art pooling strategies for perceptual image quality assessment (IQA) are based on the mean and the weighted mean. They are robust pooling strategies which usually provide a moderate to high performance for different IQAs. Recently, standard deviation (SD) pooling was also proposed. Although, this deviation pooling provides a very high performance for a few IQAs, its performance is lower than mean poolings for many other IQAs. In this paper, we propose to use the mean absolute deviation (MAD) and show that it is a more robust and accurate pooling strategy for a wider range of IQAs. In fact, MAD pooling has the advantages of both mean pooling and SD pooling. The joint computation and use of the MAD and SD pooling strategies is also considered in this paper. Experimental results provide useful information on the choice of the proper deviation pooling strategy for different IQA models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Automatic image quality assessment (IQA) plays a significant role in many image processing applications. IQA is commonly used for monitoring, benchmarking, image restoration and parameter optimization [14, 18]. Full reference IQAs, which fall within the scope of this paper, evaluate the perceptual quality of a distorted image with respect to its reference image. IQAs mimic the average quality predictions of human observers. This is a non-trivial task because images may suffer from various types and degrees of distortions.

Among IQAs, the mean squared error (MSE) is widely used because of its simplicity. However, in many situations, it does not correlate with human perception of image fidelity and quality [13]. A number of popular and/or high performance IQAs are SSIM [14], MS-SSIM [17], VIF [11], VSNR [2], MAD [4], FSIM [23], GS [7], GMSD [20], VSI [21], and others [5].

Usually, IQAs measure local similarity and produce a similarity score by a pooling strategy. This local quality measurement can be performed on the image, different representations of the original image, or their combination. For example, SSIM and MSSSIM use statistics of smoothed source and distorted images. FSIM [23] uses a phase-derived map and another gradient-based map for quality assessment. GS [7] is a contrast and structure variant metric that utilizes specialized gradient magnitude and image contrast of the image. GMSD [20] also utilizes gradient magnitude. Many of the available full reference IQAs follow this top-down architecture [5]. While average pooling [14, 17, 23, 7] and average weighted pooling [8, 15, 21] are widely used in the literature, GMSD uses standard deviation pooling [20]. For the purpose of this paper, we will take an overview of the FSIM [23] and GMSD [20] indices.

The feature similarity (FSIM) index [23] uses phase congruency () [3] as its main feature, and an image gradient magnitude as its secondary feature. Phase congruency similarity and gradient magnitude similarity are calculated and then combined as , where x is an image pixel. For the purpose of pooling, first, the maximum of reference and distorted images is computed. FSIM is then computed by the following mean pooling:

(1)

FSIM is among the leading indices in the literature; however, the high computation cost of the phase congruency makes it an inefficient index.

GMSD uses the Prewitt operator to calculate gradient magnitudes of reference and distorted images, and . From these, a gradient magnitude similarity (GMS) is calculated by:

(2)

where is a positive constant that supplies numerical stability. The GMSD is then calculated by a deviation pooling strategy, which is the standard deviation of GMS values. GMSD provides high performance for different datasets and is very efficient.

The image gradients are sensitive to image distortions; different local structures in a distorted image suffer from different degrees of degradations [20]. This is the motivation that authors in [20] used to explore the standard variation of the gradient-based local similarity map for overall image quality prediction. The standard deviation is the square root of the mean of the squares of the individual deviations. A problem with standard deviation is that the larger deviations are overemphasized in the process of squaring the deviations, since taking the square root is not a complete reversal.

In this paper, the underestimated mean absolute deviation pooling is proposed for IQAs as it is more tolerant of the large deviations. We show that the mean absolute deviation is a faster and more reliable pooling than the standard deviation for different IQAs. Also for some IQAs, a combination of these two pooling strategies results in indices that perform better. At the same time, the joint calculation of the standard deviation and the mean absolute deviation is still efficient.

2 Deviation pooling strategies

Deviation pooling for IQAs is rarely used in the literature, except the standard deviation used in GMSD [20]. Deviation is the variation of data values compared to a measure of central tendency (MCT) such as the mean, median, or mode. A deviation can be seen as the Minkowski distance of order

between vector

x and a MCT:

(3)

where, indicates the type of deviation. For the purpose of this paper, it can be seen as:

(4)

Since above equation includes a MCT, it is different than the Minkowski pooling [16], and the Minkowski metric [1]. In the following, the standard deviation pooling strategy introduced in GMSD is revisited. We then suggest using the mean absolute deviation pooling and show that these two pooling strategies can be jointly calculated.

2.1 Standard deviation () pooling

Standard deviation is a simple and very common statistical measure of the spread of scores within a set of data. Let LS denote the mean of an arbitrary local similarity (LS) map computed by an IQA model. Its standard deviation can be computed by:

(5)

which is equal to equation (4) when .

2.2 Mean absolute deviation () pooling

Given the LS, the mean absolute deviation LS of an LS map is calculated by:

(6)

which is also equal to equation (4) when . In the experimental results section, the performance of the pooling based indices is evaluated.

2.3 Double deviation () pooling

We also found that a combination of the and pooling strategies provides higher performance than either LS or LS for some IQAs. Furthermore, these two poolings can be computed at the same time using the following formulas:

(7)
(8)
(9)

The joint computation of the LS and LS is likewise computationally efficient. We call this combination the double deviation pooling strategy and compute it by:

(10)

where adjusts the relative importance of the LS and LS indices. We are able to combine these two pooling strategies because they share almost the same statistical characteristics and values. The usefulness of this combination is verified in the experiments. It should be noted that the median absolute deviation does not provide satisfactory performance, hence it is not evaluated in this paper.

3 Experimental results

In the experiments, four standard datasets are used. The LIVE dataset [12] contains 29 reference images and 779 distorted images of five categories. The TID 2008 [10] dataset contains 25 reference images and 1700 distorted images. For each reference image, 17 types of distortions of 4 degrees are available. CSIQ [4] is another dataset that consists of 30 reference images; each is distorted using six different types of distortions at four to five levels of distortion. We also used the TID 2013 [9]

dataset, which contains 25 reference images and 3000 distorted images. For each reference image, 24 types of distortions of 5 degrees are available. Also, three popular evaluation metrics were used in the experiments: the Spearman Rank order Correlation coefficient (SRC), the Pearson linear Correlation Coefficient (PCC), and the Root Mean Square Error (RMSE).

MSE, SSIM [14], GS [7, 6], FSIM [23, 22], GMSD [20, 19], and VSI [21] were used in the experiments using two to three deviation pooling strategies111Code: https://dl.dropboxusercontent.com/u/74505502/DeviationPoolings.m. Tables 1 and 2 provide a performance comparison between different indices using different pooling strategies. In reference to Fig. 1, if pooling provides higher performance than the others, its performance is added to Tables 1 and 2. Also, some of the state-of-the-art indices are added to the end of Tables 1 and 2. In Figs. 1 and 2, postulates to the pooling, and postulates to the pooling. In fact, Figs. 1 and 2 show SRC and PCC performance variations of different indices utilizing the pooling.

Figure 1: The impact of the parameter on the weighted average SRC (Table 1) performance of the double deviation pooling strategy.
Figure 2: The impact of the parameter on the weighted average PCC (Table 1) performance of the double deviation pooling strategy. Note the MSE curve which is nonlinear.
(a) original image taken from TID 2008 [10] (b) subjective score = 7.1290 : GS = 0.0030, GMS = 0.0225 : GS = 0.0013, GMS = 0.0075 (c) subjective score = 5.0645 : GS = 0.0027, GMS = 0.0127 : GS = 0.0020, GMS = 0.0111
Figure 3: A comparison of and for contrast distortion type. (a) original image, (b)-(c) contrast distortion images of (a). Note that higher subjective scores and lower GS/GMS scores indicate higher quality. Clearly, GS/GMS provides better judgment than GS/GMS .
IQA LIVE (779 images) CSIQ (886 images) TID2008 (1700 images) Weighted avg SRC (distortions)
SRC PCC RMSE SRC PCC RMSE SRC PCC RMSE SRC PCC avg min std
MSE 0.8756 0.8723 13.3597 0.8058 0.7512 0.1733 0.5531 0.5734 1.0994 0.6943 0.6894 0.8473 0.5815 0.1078
MSE [20] 0.8771 0.5707 N/A 0.8344 0.6448 0.2007 0.5801 0.5593 1.1124 0.7158 0.5845 0.8290 0.2624 0.1658
MSE 0.8746 0.8716 13.3944 0.8239 0.5594 0.2176 0.6712 0.6473 1.0229 0.7585 0.6761 0.8422 0.5045 0.1300
SSIM [14] 0.9479 0.9449 8.9455 0.8756 0.8613 0.1334 0.7749 0.7732 0.8511 0.8415 0.8361 0.8644 0.5246 0.1069
SSIM [20] 0.9174 0.9032 11.7261 0.8169 0.8094 0.1542 0.7560 0.7374 0.9063 0.8094 0.7948 0.8154 0.0083 0.2209
SSIM 0.9166 0.9017 11.8156 0.8388 0.8316 0.1458 0.7775 0.7619 0.8691 0.8258 0.8126 0.8255 0.2059 0.1836
GS [7] 0.9561 0.9512 8.4327 0.9108 0.8964 0.1164 0.8504 0.8422 0.7235 0.8908 0.8817 0.8915 0.6691 0.0923
GS 0.9464 0.9409 9.2559 0.9308 0.9241 0.1003 0.8466 0.8307 0.7470 0.8919 0.8808 0.8872 0.6146 0.1108
GS 0.9538 0.9486 8.6494 0.9295 0.9198 0.1030 0.8823 0.8720 0.6568 0.9113 0.9023 0.8899 0.6523 0.1025
FSIM [23] 0.9634 0.9597 7.6780 0.9242 0.9120 0.1077 0.8805 0.8738 0.6525 0.9112 0.9038 0.8881 0.6481 0.0934
FSIM [20] 0.9602 0.9579 7.8442 0.9566 0.9534 0.0792 0.8914 0.8762 0.6467 0.9245 0.9154 0.8843 0.4412 0.1268
FSIM 0.9609 0.9580 7.8370 0.9525 0.9460 0.0851 0.8783 0.8634 0.6770 0.9170 0.9071 0.8916 0.6195 0.1043
FSIM () 0.9611 0.9584 7.8003 0.9555 0.9507 0.0814 0.8935 0.8775 0.6436 0.9255 0.9155 0.8882 0.5066 0.1164
GMS 0.9595 0.9556 8.0489 0.9290 0.9127 0.1073 0.8477 0.8366 0.7351 0.8950 0.8842 0.8924 0.6301 0.0962
GMSD [20] 0.9603 0.9603 7.6214 0.9570 0.9541 0.0786 0.8907 0.8788 0.6404 0.9243 0.9175 0.8849 0.4659 0.1246
GMS 0.9627 0.9618 7.4802 0.9532 0.9457 0.0853 0.8837 0.8711 0.6589 0.9203 0.9118 0.8935 0.6224 0.1034
GMS () 0.9619 0.9614 7.5149 0.9559 0.9509 0.0813 0.8961 0.8830 0.6300 0.9271 0.9190 0.8889 0.5204 0.1160
VSI [21] 0.9524 0.9482 8.6817 0.9423 0.9279 0.0979 0.8979 0.8762 0.6466 0.9222 0.9065 0.8987 0.6295 0.1036
VSI [20] 0.9546 0.9519 8.3757 0.9569 0.9532 0.0801 0.8775 0.8585 0.6881 0.9163 0.9048 0.8957 0.5393 0.1080
VSI 0.9577 0.9540 8.1952 0.9546 0.9449 0.0859 0.9048 0.8866 0.6207 0.9302 0.9175 0.9027 0.6312 0.0949
VSI () 0.9575 0.9539 8.1958 0.9553 0.9462 0.0849 0.9048 0.8872 0.6193 0.9303 0.9182 0.9029 0.6275 0.0948
MSSSIM [17] 0.9513 0.9489 8.6188 0.9133 0.8991 0.1149 0.8542 0.8451 0.7173 0.8922 0.8834 0.8796 0.6381 0.0993
VIF [11] 0.9636 0.9604 7.6137 0.9195 0.9277 0.0980 0.7491 0.8084 0.7899 0.8436 0.8750 0.8949 0.5102 0.0987
IWSSIM [15] 0.9567 0.9522 8.3472 0.9213 0.9144 0.1063 0.8559 0.8579 0.6895 0.8965 0.8946 0.8708 0.6301 0.1063
MAD [4] 0.9669 0.9675 6.9073 0.9466 0.9500 0.0820 0.8340 0.8290 0.7505 0.8944 0.8929 0.8387 0.0650 0.2152
Table 1: Performance comparison of the different quality indices with and without deviation pooling strategies on LIVE [12], CSIQ [4] and TID 2008 [10] datasets.
IQA TID2013 SRC (distortions)
SRC PCC avg min std
MSE 0.6394 0.4785 0.7951 0.0766 0.2355
MSE 0.6164 0.5118 0.7899 0.0952 0.2282
MSE 0.6891 0.7079 0.7897 0.0852 0.2413
SSIM [14] 0.7417 0.7895 0.8075 0.3775 0.1521
SSIM 0.7292 0.7602 0.7993 0.0045 0.2257
SSIM 0.7463 0.7806 0.8041 0.1471 0.1989
GS [7] 0.7946 0.8464 0.8351 0.3578 0.1511
GS 0.7801 0.8232 0.8284 0.3139 0.1731
GS 0.8081 0.8613 0.8332 0.3344 0.1584
FSIM [23] 0.8015 0.8589 0.8219 0.2748 0.1662
FSIM [20] 0.8077 0.8602 0.8263 0.2126 0.1940
FSIM 0.8051 0.8547 0.8327 0.2691 0.1706
FSIM () 0.8118 0.8614 0.8293 0.2304 0.1852
GMS 0.7884 0.8395 0.8370 0.3700 0.1500
GMSD [20] 0.8044 0.8590 0.8300 0.2948 0.1815
GMS 0.8084 0.8608 0.8369 0.3450 0.1597
GMS () 0.8111 0.8658 0.8332 0.3132 0.1729
VSI [21] 0.8965 0.9000 0.8514 0.1713 0.1787
VSI 0.8556 0.8651 0.8633 0.3935 0.1274
VSI 0.8853 0.8965 0.8675 0.4564 0.1193
VSI () 0.8830 0.8951 0.8680 0.4542 0.1183
MSSSIM [17] 0.7859 0.8329 0.8109 0.4099 0.1506
VIF [11] 0.6769 0.7720 0.8267 0.3099 0.1464
IWSSIM [15] 0.7779 0.8319 0.7978 0.3717 0.1601
MAD [4] 0.7807 0.8267 0.7556 0.0575 0.2644
Table 2: Performance comparison of the different quality indices with and without deviation pooling strategies on TID 2013 dataset [9].

From Tables 1 and 2, it can be seen that MSE performs better than the MSE, while the MSE shows the lowest performance among them. The original SSIM outperforms its deviation-based versions. The pooling has higher performance than the pooling for SSIM. For SSIM, the deviation poolings show very low performance for some of the distortions, while the pooling is still more robust than the pooling.

For GS, the pooling outperforms the others. pooling for GS does not provide a high performance because GS uses image contrast, and we had already observed that pooling is not a good choice for MSE. While FSIM shows overall higher performance than FSIM , its performance for some distortion types is low. For FSIM, pooling with provides higher overall performance than pooling. At the same time, pooling shows better quality prediction on distortion types than pooling. The overall performance of the GMS and GMS indices is competitive; however, GMS shows better quality prediction on the distortion types. The overall performance of GMS and its performance for distortion types are simultaneously higher than GMS .

VSI is a high performance similarity index that was recently proposed in [21]. Using pooling, its performance improved considerably on the first three datasets for all of the measures used in this paper. For the TID 2013 dataset, however, its overall performance decreased by 1.2493% for SRC and by 0.3889% for PCC metrics. In turn, it has 1.8910% better average prediction on distortion types. Also, the minimum quality prediction of VSI improved from 0.1713 to the 0.4564. These advantages show that pooling is a good choice for VSI. pooling shows the lowest performance for VSI.

Overall, pooling is more robust than pooling, especially for assessment of individual distortion types. The low min SRC values for pooling in Table 1 show its unreliability in comparison to pooling. In general, higher orders of in equation (4) result in a worst and unstable assessment for distortion types. In other words, the std value in the last column of the Table 1 increases by increasing the value. It is worth noting that this fact may not always be true.

Fig. 3 shows an example in which pooling fails in assessment, while pooling provides a true assessment for both the GS and GMS indices. Fig. 4 shows the run time of the three pooling strategies used in this paper. Our experiments were performed on a Core i7 3.4 GHz CPU with 16 GB of RAM running on MATLAB 2013b and Windows 7. is the second fastest after the mean, while the joint calculation of and is still efficient. Therefore, GMS is even faster than the highly efficient GMSD. GMS is slightly slower than GMSD; however, its improved performance over GMS is noticeable.

Figure 4: Run time versus the local similarity (LS) size of the mean pooling and three deviation pooling strategies used in this paper.

4 Conclusion

Deviation pooling strategies for full reference image quality assessment were analyzed. The mean absolute deviation (MAD) pooling and the standard deviation (SD) pooling strategies were compared on the basis of their effectivity, robustness and efficiency. The computation of MAD is faster than SD, and this may be of high interest for designing more efficient indices. While none of them fully outperformed the others, MAD pooling shows a clear advantage of robustness over SD pooling. Furthermore, for some of the image quality assessment models, a combination of these two pooling strategies results in better performing indices. Considering the experimental results, we highly recommend the use of MAD pooling for different image quality assessment purposes.

References

  • [1] A. Bovik, editor. Handbook of Image and Video Processing. Academic Press, 2000.
  • [2] D. Chandler and S. Hemami. VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images. IEEE Transactions on Image Processing, 16(9):2284–2298, Sept 2007.
  • [3] P. Kovesi. Image features from phase congruency.

    Videre: Journal of Computer Vision Research

    , 1:1–26, 1999.
  • [4] E. C. Larson and D. M. Chandler. Most apparent distortion: full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19(1):011006, 2010.
  • [5] W. Lin and C.-C. J. Kuo. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation, 22(4):297 – 312, 2011.
  • [6] A. Liu, W. Lin, and M. Narwaria. Gradient similarity index. Online: http://www.ntu.edu.sg/home/wslin/GSM.zip.
  • [7] A. Liu, W. Lin, and M. Narwaria. Image quality assessment based on gradient similarity. IEEE Transactions on Image Processing, 21(4):1500–1512, April 2012.
  • [8] A. Moorthy and A. Bovik. Visual importance pooling for image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 3(2):193–201, April 2009.
  • [9] N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. Kuo. Color image database TID2013: Peculiarities and preliminary results. In 2013 4th European Workshop on Visual Information Processing (EUVIP), pages 106–111, June 2013.
  • [10] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti. TID2008 - A Database for Evaluation of Full-Reference Visual Quality Assessment Metrics. Advances of Modern Radioelectronics, 10:30–45, 2009.
  • [11] H. Sheikh and A. Bovik. Image information and visual quality. IEEE Transactions on Image Processing, 15(2):430–444, Feb 2006.
  • [12] H. Sheikh, Z. Wang, L. Cormack, and A. Bovik. Live image quality assessment database release 2. Online: http://live.ece.utexas.edu/research/quality.
  • [13] Z. Wang and A. Bovik. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Processing Magazine, 26(1):98–117, Jan 2009.
  • [14] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004.
  • [15] Z. Wang and Q. Li. Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing, 20(5):1185–1198, May 2011.
  • [16] Z. Wang and X. Shang. Spatial pooling strategies for perceptual image quality assessment. In IEEE International Conference on Image Processing, pages 2945–2948, Oct 2006.
  • [17] Z. Wang, E. Simoncelli, and A. Bovik. Multiscale structural similarity for image quality assessment. In Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, volume 2, pages 1398–1402, Nov 2003.
  • [18] S. Winkler. Analysis of public image and video databases for quality assessment. IEEE Journal of Selected Topics in Signal Processing, 6(6):616–625, Oct 2012.
  • [19] W. Xue, L. Zhang, X. Mou, and A. Bovik. Gradient magnitude similarity deviation. Online: http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD.
  • [20] W. Xue, L. Zhang, X. Mou, and A. Bovik. Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index. IEEE Transactions on Image Processing, 23(2):684–695, Feb 2014.
  • [21] L. Zhang, Y. Shen, and H. Li. VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment. IEEE Transactions on Image Processing, 23(10):4270–4281, Oct 2014.
  • [22] L. Zhang, D. Zhang, X. Mou, and D. Zhang. Feature similarity index. Online: http://www4.comp.polyu.edu.hk/~cslzhang/IQA/FSIM.
  • [23] L. Zhang, D. Zhang, X. Mou, and D. Zhang. Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8):2378–2386, Aug 2011.