1 Introduction
Automatic image quality assessment (IQA) plays a significant role in many image processing applications. IQA is commonly used for monitoring, benchmarking, image restoration and parameter optimization [14, 18]. Full reference IQAs, which fall within the scope of this paper, evaluate the perceptual quality of a distorted image with respect to its reference image. IQAs mimic the average quality predictions of human observers. This is a nontrivial task because images may suffer from various types and degrees of distortions.
Among IQAs, the mean squared error (MSE) is widely used because of its simplicity. However, in many situations, it does not correlate with human perception of image fidelity and quality [13]. A number of popular and/or high performance IQAs are SSIM [14], MSSSIM [17], VIF [11], VSNR [2], MAD [4], FSIM [23], GS [7], GMSD [20], VSI [21], and others [5].
Usually, IQAs measure local similarity and produce a similarity score by a pooling strategy. This local quality measurement can be performed on the image, different representations of the original image, or their combination. For example, SSIM and MSSSIM use statistics of smoothed source and distorted images. FSIM [23] uses a phasederived map and another gradientbased map for quality assessment. GS [7] is a contrast and structure variant metric that utilizes specialized gradient magnitude and image contrast of the image. GMSD [20] also utilizes gradient magnitude. Many of the available full reference IQAs follow this topdown architecture [5]. While average pooling [14, 17, 23, 7] and average weighted pooling [8, 15, 21] are widely used in the literature, GMSD uses standard deviation pooling [20]. For the purpose of this paper, we will take an overview of the FSIM [23] and GMSD [20] indices.
The feature similarity (FSIM) index [23] uses phase congruency () [3] as its main feature, and an image gradient magnitude as its secondary feature. Phase congruency similarity and gradient magnitude similarity are calculated and then combined as , where x is an image pixel. For the purpose of pooling, first, the maximum of reference and distorted images is computed. FSIM is then computed by the following mean pooling:
(1) 
FSIM is among the leading indices in the literature; however, the high computation cost of the phase congruency makes it an inefficient index.
GMSD uses the Prewitt operator to calculate gradient magnitudes of reference and distorted images, and . From these, a gradient magnitude similarity (GMS) is calculated by:
(2) 
where is a positive constant that supplies numerical stability. The GMSD is then calculated by a deviation pooling strategy, which is the standard deviation of GMS values. GMSD provides high performance for different datasets and is very efficient.
The image gradients are sensitive to image distortions; different local structures in a distorted image suffer from different degrees of degradations [20]. This is the motivation that authors in [20] used to explore the standard variation of the gradientbased local similarity map for overall image quality prediction. The standard deviation is the square root of the mean of the squares of the individual deviations. A problem with standard deviation is that the larger deviations are overemphasized in the process of squaring the deviations, since taking the square root is not a complete reversal.
In this paper, the underestimated mean absolute deviation pooling is proposed for IQAs as it is more tolerant of the large deviations. We show that the mean absolute deviation is a faster and more reliable pooling than the standard deviation for different IQAs. Also for some IQAs, a combination of these two pooling strategies results in indices that perform better. At the same time, the joint calculation of the standard deviation and the mean absolute deviation is still efficient.
2 Deviation pooling strategies
Deviation pooling for IQAs is rarely used in the literature, except the standard deviation used in GMSD [20]. Deviation is the variation of data values compared to a measure of central tendency (MCT) such as the mean, median, or mode. A deviation can be seen as the Minkowski distance of order
between vector
x and a MCT:(3) 
where, indicates the type of deviation. For the purpose of this paper, it can be seen as:
(4) 
Since above equation includes a MCT, it is different than the Minkowski pooling [16], and the Minkowski metric [1]. In the following, the standard deviation pooling strategy introduced in GMSD is revisited. We then suggest using the mean absolute deviation pooling and show that these two pooling strategies can be jointly calculated.
2.1 Standard deviation () pooling
Standard deviation is a simple and very common statistical measure of the spread of scores within a set of data. Let LS denote the mean of an arbitrary local similarity (LS) map computed by an IQA model. Its standard deviation can be computed by:
(5) 
which is equal to equation (4) when .
2.2 Mean absolute deviation () pooling
Given the LS, the mean absolute deviation LS of an LS map is calculated by:
(6) 
which is also equal to equation (4) when . In the experimental results section, the performance of the pooling based indices is evaluated.
2.3 Double deviation () pooling
We also found that a combination of the and pooling strategies provides higher performance than either LS or LS for some IQAs. Furthermore, these two poolings can be computed at the same time using the following formulas:
(7) 
(8) 
(9) 
The joint computation of the LS and LS is likewise computationally efficient. We call this combination the double deviation pooling strategy and compute it by:
(10) 
where adjusts the relative importance of the LS and LS indices. We are able to combine these two pooling strategies because they share almost the same statistical characteristics and values. The usefulness of this combination is verified in the experiments. It should be noted that the median absolute deviation does not provide satisfactory performance, hence it is not evaluated in this paper.
3 Experimental results
In the experiments, four standard datasets are used. The LIVE dataset [12] contains 29 reference images and 779 distorted images of five categories. The TID 2008 [10] dataset contains 25 reference images and 1700 distorted images. For each reference image, 17 types of distortions of 4 degrees are available. CSIQ [4] is another dataset that consists of 30 reference images; each is distorted using six different types of distortions at four to five levels of distortion. We also used the TID 2013 [9]
dataset, which contains 25 reference images and 3000 distorted images. For each reference image, 24 types of distortions of 5 degrees are available. Also, three popular evaluation metrics were used in the experiments: the Spearman Rank order Correlation coefficient (SRC), the Pearson linear Correlation Coefficient (PCC), and the Root Mean Square Error (RMSE).
MSE, SSIM [14], GS [7, 6], FSIM [23, 22], GMSD [20, 19], and VSI [21] were used in the experiments using two to three deviation pooling strategies^{1}^{1}1Code: https://dl.dropboxusercontent.com/u/74505502/DeviationPoolings.m. Tables 1 and 2 provide a performance comparison between different indices using different pooling strategies. In reference to Fig. 1, if pooling provides higher performance than the others, its performance is added to Tables 1 and 2. Also, some of the stateoftheart indices are added to the end of Tables 1 and 2. In Figs. 1 and 2, postulates to the pooling, and postulates to the pooling. In fact, Figs. 1 and 2 show SRC and PCC performance variations of different indices utilizing the pooling.
IQA  LIVE (779 images)  CSIQ (886 images)  TID2008 (1700 images)  Weighted avg  SRC (distortions)  

SRC  PCC  RMSE  SRC  PCC  RMSE  SRC  PCC  RMSE  SRC  PCC  avg  min  std  
MSE  0.8756  0.8723  13.3597  0.8058  0.7512  0.1733  0.5531  0.5734  1.0994  0.6943  0.6894  0.8473  0.5815  0.1078 
MSE [20]  0.8771  0.5707  N/A  0.8344  0.6448  0.2007  0.5801  0.5593  1.1124  0.7158  0.5845  0.8290  0.2624  0.1658 
MSE  0.8746  0.8716  13.3944  0.8239  0.5594  0.2176  0.6712  0.6473  1.0229  0.7585  0.6761  0.8422  0.5045  0.1300 
SSIM [14]  0.9479  0.9449  8.9455  0.8756  0.8613  0.1334  0.7749  0.7732  0.8511  0.8415  0.8361  0.8644  0.5246  0.1069 
SSIM [20]  0.9174  0.9032  11.7261  0.8169  0.8094  0.1542  0.7560  0.7374  0.9063  0.8094  0.7948  0.8154  0.0083  0.2209 
SSIM  0.9166  0.9017  11.8156  0.8388  0.8316  0.1458  0.7775  0.7619  0.8691  0.8258  0.8126  0.8255  0.2059  0.1836 
GS [7]  0.9561  0.9512  8.4327  0.9108  0.8964  0.1164  0.8504  0.8422  0.7235  0.8908  0.8817  0.8915  0.6691  0.0923 
GS  0.9464  0.9409  9.2559  0.9308  0.9241  0.1003  0.8466  0.8307  0.7470  0.8919  0.8808  0.8872  0.6146  0.1108 
GS  0.9538  0.9486  8.6494  0.9295  0.9198  0.1030  0.8823  0.8720  0.6568  0.9113  0.9023  0.8899  0.6523  0.1025 
FSIM [23]  0.9634  0.9597  7.6780  0.9242  0.9120  0.1077  0.8805  0.8738  0.6525  0.9112  0.9038  0.8881  0.6481  0.0934 
FSIM [20]  0.9602  0.9579  7.8442  0.9566  0.9534  0.0792  0.8914  0.8762  0.6467  0.9245  0.9154  0.8843  0.4412  0.1268 
FSIM  0.9609  0.9580  7.8370  0.9525  0.9460  0.0851  0.8783  0.8634  0.6770  0.9170  0.9071  0.8916  0.6195  0.1043 
FSIM ()  0.9611  0.9584  7.8003  0.9555  0.9507  0.0814  0.8935  0.8775  0.6436  0.9255  0.9155  0.8882  0.5066  0.1164 
GMS  0.9595  0.9556  8.0489  0.9290  0.9127  0.1073  0.8477  0.8366  0.7351  0.8950  0.8842  0.8924  0.6301  0.0962 
GMSD [20]  0.9603  0.9603  7.6214  0.9570  0.9541  0.0786  0.8907  0.8788  0.6404  0.9243  0.9175  0.8849  0.4659  0.1246 
GMS  0.9627  0.9618  7.4802  0.9532  0.9457  0.0853  0.8837  0.8711  0.6589  0.9203  0.9118  0.8935  0.6224  0.1034 
GMS ()  0.9619  0.9614  7.5149  0.9559  0.9509  0.0813  0.8961  0.8830  0.6300  0.9271  0.9190  0.8889  0.5204  0.1160 
VSI [21]  0.9524  0.9482  8.6817  0.9423  0.9279  0.0979  0.8979  0.8762  0.6466  0.9222  0.9065  0.8987  0.6295  0.1036 
VSI [20]  0.9546  0.9519  8.3757  0.9569  0.9532  0.0801  0.8775  0.8585  0.6881  0.9163  0.9048  0.8957  0.5393  0.1080 
VSI  0.9577  0.9540  8.1952  0.9546  0.9449  0.0859  0.9048  0.8866  0.6207  0.9302  0.9175  0.9027  0.6312  0.0949 
VSI ()  0.9575  0.9539  8.1958  0.9553  0.9462  0.0849  0.9048  0.8872  0.6193  0.9303  0.9182  0.9029  0.6275  0.0948 
MSSSIM [17]  0.9513  0.9489  8.6188  0.9133  0.8991  0.1149  0.8542  0.8451  0.7173  0.8922  0.8834  0.8796  0.6381  0.0993 
VIF [11]  0.9636  0.9604  7.6137  0.9195  0.9277  0.0980  0.7491  0.8084  0.7899  0.8436  0.8750  0.8949  0.5102  0.0987 
IWSSIM [15]  0.9567  0.9522  8.3472  0.9213  0.9144  0.1063  0.8559  0.8579  0.6895  0.8965  0.8946  0.8708  0.6301  0.1063 
MAD [4]  0.9669  0.9675  6.9073  0.9466  0.9500  0.0820  0.8340  0.8290  0.7505  0.8944  0.8929  0.8387  0.0650  0.2152 
IQA  TID2013  SRC (distortions)  
SRC  PCC  avg  min  std  
MSE  0.6394  0.4785  0.7951  0.0766  0.2355 
MSE  0.6164  0.5118  0.7899  0.0952  0.2282 
MSE  0.6891  0.7079  0.7897  0.0852  0.2413 
SSIM [14]  0.7417  0.7895  0.8075  0.3775  0.1521 
SSIM  0.7292  0.7602  0.7993  0.0045  0.2257 
SSIM  0.7463  0.7806  0.8041  0.1471  0.1989 
GS [7]  0.7946  0.8464  0.8351  0.3578  0.1511 
GS  0.7801  0.8232  0.8284  0.3139  0.1731 
GS  0.8081  0.8613  0.8332  0.3344  0.1584 
FSIM [23]  0.8015  0.8589  0.8219  0.2748  0.1662 
FSIM [20]  0.8077  0.8602  0.8263  0.2126  0.1940 
FSIM  0.8051  0.8547  0.8327  0.2691  0.1706 
FSIM ()  0.8118  0.8614  0.8293  0.2304  0.1852 
GMS  0.7884  0.8395  0.8370  0.3700  0.1500 
GMSD [20]  0.8044  0.8590  0.8300  0.2948  0.1815 
GMS  0.8084  0.8608  0.8369  0.3450  0.1597 
GMS ()  0.8111  0.8658  0.8332  0.3132  0.1729 
VSI [21]  0.8965  0.9000  0.8514  0.1713  0.1787 
VSI  0.8556  0.8651  0.8633  0.3935  0.1274 
VSI  0.8853  0.8965  0.8675  0.4564  0.1193 
VSI ()  0.8830  0.8951  0.8680  0.4542  0.1183 
MSSSIM [17]  0.7859  0.8329  0.8109  0.4099  0.1506 
VIF [11]  0.6769  0.7720  0.8267  0.3099  0.1464 
IWSSIM [15]  0.7779  0.8319  0.7978  0.3717  0.1601 
MAD [4]  0.7807  0.8267  0.7556  0.0575  0.2644 
From Tables 1 and 2, it can be seen that MSE performs better than the MSE, while the MSE shows the lowest performance among them. The original SSIM outperforms its deviationbased versions. The pooling has higher performance than the pooling for SSIM. For SSIM, the deviation poolings show very low performance for some of the distortions, while the pooling is still more robust than the pooling.
For GS, the pooling outperforms the others. pooling for GS does not provide a high performance because GS uses image contrast, and we had already observed that pooling is not a good choice for MSE. While FSIM shows overall higher performance than FSIM , its performance for some distortion types is low. For FSIM, pooling with provides higher overall performance than pooling. At the same time, pooling shows better quality prediction on distortion types than pooling. The overall performance of the GMS and GMS indices is competitive; however, GMS shows better quality prediction on the distortion types. The overall performance of GMS and its performance for distortion types are simultaneously higher than GMS .
VSI is a high performance similarity index that was recently proposed in [21]. Using pooling, its performance improved considerably on the first three datasets for all of the measures used in this paper. For the TID 2013 dataset, however, its overall performance decreased by 1.2493% for SRC and by 0.3889% for PCC metrics. In turn, it has 1.8910% better average prediction on distortion types. Also, the minimum quality prediction of VSI improved from 0.1713 to the 0.4564. These advantages show that pooling is a good choice for VSI. pooling shows the lowest performance for VSI.
Overall, pooling is more robust than pooling, especially for assessment of individual distortion types. The low min SRC values for pooling in Table 1 show its unreliability in comparison to pooling. In general, higher orders of in equation (4) result in a worst and unstable assessment for distortion types. In other words, the std value in the last column of the Table 1 increases by increasing the value. It is worth noting that this fact may not always be true.
Fig. 3 shows an example in which pooling fails in assessment, while pooling provides a true assessment for both the GS and GMS indices. Fig. 4 shows the run time of the three pooling strategies used in this paper. Our experiments were performed on a Core i7 3.4 GHz CPU with 16 GB of RAM running on MATLAB 2013b and Windows 7. is the second fastest after the mean, while the joint calculation of and is still efficient. Therefore, GMS is even faster than the highly efficient GMSD. GMS is slightly slower than GMSD; however, its improved performance over GMS is noticeable.
4 Conclusion
Deviation pooling strategies for full reference image quality assessment were analyzed. The mean absolute deviation (MAD) pooling and the standard deviation (SD) pooling strategies were compared on the basis of their effectivity, robustness and efficiency. The computation of MAD is faster than SD, and this may be of high interest for designing more efficient indices. While none of them fully outperformed the others, MAD pooling shows a clear advantage of robustness over SD pooling. Furthermore, for some of the image quality assessment models, a combination of these two pooling strategies results in better performing indices. Considering the experimental results, we highly recommend the use of MAD pooling for different image quality assessment purposes.
References
 [1] A. Bovik, editor. Handbook of Image and Video Processing. Academic Press, 2000.
 [2] D. Chandler and S. Hemami. VSNR: A WaveletBased Visual SignaltoNoise Ratio for Natural Images. IEEE Transactions on Image Processing, 16(9):2284–2298, Sept 2007.

[3]
P. Kovesi.
Image features from phase congruency.
Videre: Journal of Computer Vision Research
, 1:1–26, 1999.  [4] E. C. Larson and D. M. Chandler. Most apparent distortion: fullreference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19(1):011006, 2010.
 [5] W. Lin and C.C. J. Kuo. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation, 22(4):297 – 312, 2011.
 [6] A. Liu, W. Lin, and M. Narwaria. Gradient similarity index. Online: http://www.ntu.edu.sg/home/wslin/GSM.zip.
 [7] A. Liu, W. Lin, and M. Narwaria. Image quality assessment based on gradient similarity. IEEE Transactions on Image Processing, 21(4):1500–1512, April 2012.
 [8] A. Moorthy and A. Bovik. Visual importance pooling for image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 3(2):193–201, April 2009.
 [9] N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.C. Kuo. Color image database TID2013: Peculiarities and preliminary results. In 2013 4th European Workshop on Visual Information Processing (EUVIP), pages 106–111, June 2013.
 [10] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti. TID2008  A Database for Evaluation of FullReference Visual Quality Assessment Metrics. Advances of Modern Radioelectronics, 10:30–45, 2009.
 [11] H. Sheikh and A. Bovik. Image information and visual quality. IEEE Transactions on Image Processing, 15(2):430–444, Feb 2006.
 [12] H. Sheikh, Z. Wang, L. Cormack, and A. Bovik. Live image quality assessment database release 2. Online: http://live.ece.utexas.edu/research/quality.
 [13] Z. Wang and A. Bovik. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Processing Magazine, 26(1):98–117, Jan 2009.
 [14] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004.
 [15] Z. Wang and Q. Li. Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing, 20(5):1185–1198, May 2011.
 [16] Z. Wang and X. Shang. Spatial pooling strategies for perceptual image quality assessment. In IEEE International Conference on Image Processing, pages 2945–2948, Oct 2006.
 [17] Z. Wang, E. Simoncelli, and A. Bovik. Multiscale structural similarity for image quality assessment. In ThirtySeventh Asilomar Conference on Signals, Systems and Computers, volume 2, pages 1398–1402, Nov 2003.
 [18] S. Winkler. Analysis of public image and video databases for quality assessment. IEEE Journal of Selected Topics in Signal Processing, 6(6):616–625, Oct 2012.
 [19] W. Xue, L. Zhang, X. Mou, and A. Bovik. Gradient magnitude similarity deviation. Online: http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD.
 [20] W. Xue, L. Zhang, X. Mou, and A. Bovik. Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index. IEEE Transactions on Image Processing, 23(2):684–695, Feb 2014.
 [21] L. Zhang, Y. Shen, and H. Li. VSI: A Visual SaliencyInduced Index for Perceptual Image Quality Assessment. IEEE Transactions on Image Processing, 23(10):4270–4281, Oct 2014.
 [22] L. Zhang, D. Zhang, X. Mou, and D. Zhang. Feature similarity index. Online: http://www4.comp.polyu.edu.hk/~cslzhang/IQA/FSIM.
 [23] L. Zhang, D. Zhang, X. Mou, and D. Zhang. Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8):2378–2386, Aug 2011.
Comments
There are no comments yet.