Recently, online platforms have been dominated by images because of the advances in capturing, storage, streaming, and display technologies. In order for these platforms to support the uploaded content, images should be formatted. The formatting of images can be considered as an optimization problem whose cost function is an image quality assessment algorithm. These quality assessment algorithms are grouped into three main categories as full-reference, reduced-reference, and no-reference. Design of these algorithms usually rely on visual system characteristics because their objective is estimating subjective quality. The characteristics of the human visual system include frequency sensitivity, luminance sensitivity, and masking effectsWL+95 . Sensitivity of a visual system with respect to spatial frequency characteristics is considered under frequency sensitivity, just-noticeable intensity difference is studied under luminance sensitivity, and decreasing visibility of a signal in the presence of other signals is considered under masking. In Hegazy2014 , the authors do not directly investigate these characteristics individually but they introduce a quality assessment algorithm (COHERENSI) that captures perceptually correlated information from the phase and harmonic analysis of error signals.
The harmonics analysis in Hegazy2014
is based on the gradients of the error signals. Consecutive Fourier transforms are applied to measure the chaotic behavior in gradient representations. In this manuscript, we directly focus on the error signals without calculating their gradients. Instead of applying consecutive transforms, we apply it only once to stay in the Fourier domain and focus solely on the magnitude information. If we reconstruct images without their phase information, they are usually unrecognizable. This is because phase information includes the location of the image features that are critical for reconstructing the original image in the spatial domain. However, in this study, we do not need to reconstruct the images in the spatial domain. We do not utilize the phase information in the Fourier domain, but we use the location information while obtaining the difference of reference and compared images pixel-wise in the spatial domain. We analyze magnitude spectrums over each color channel in the RGB color space for multiple scales and use frequency-based weights to align quality scores. The main contributions of this manuscript compared to the baseline study is six folds.
We analyzed the magnitude spectrums of error signals obtained from natural images and show that there is a general relationship between the magnitude spectums of error images and degradation levels.
We eliminated the requirement for fine-tuned parameters that were utilized for fusion of phase and harmonic analysis as well as the scaling fraction parameter used in the multi-scale calculation.
We extended the baseline spectral analysis method with multi-scale and multi-channel error representations along with frequency-based weights, which significantly outperforms majority of the compared methods in all benchmark categories.
We enlarged the test set from images to images in the TID 2013 database and added the Multiply Distorted LIVE database to the validation, which includes simultaneously applied distortions.
We increased the number of validation metrics from two to five along with statistical significance tests for correlation metrics. We analyzed the distribution of scores with scatter plots and histogram difference metrics. Based on the overall validation, we showed that SUMMER is consistently among the top performing methods.
We measured the capability of quality assessment algorithms to () distinguish statistically different and similar pairs and to () identify the higher quality image and the lower quality image. We showed that SUMMER significantly outperforms all other top performing algorithms in task () and all other than one algorithm in task (ii).
We briefly discuss the related work in Section 2 and describe the baseline spectral analysis method in Section 3. We extend the baseline method with multiple scales, multiple color channels, and frequency-based weights in Section 4. We describe the experimental setup in Section 5 and report the results in Section 6. Finally, we conclude our work in Section 7.
2 Related Work
An intuitive approach to assess the quality of an image is to measure fidelity, which can be performed by comparing the image with its distortion-free version, if available. Mean Square Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) are commonly utilized examples of fidelity-based full-reference methods. Wang et al. Wang2004 showed that human perception is more consistent with structural similarity as opposed to MSE and PSNR. Structural similarity methods such as SSIM Wang2004 were shown to be more correlated with human error perception. Spatial domain-based single-scale structural similarity was further extended to multi-scale (MS-SSIM) Wang2003 , complex domain (CW-SSIM) Sampat2009 , and information-weighted (IW-SSIM) Wang2011 versions. Instead of focusing on the structural similarity, Ponomarenko et al. developed a series of quality estimators EA+06 ; PS+07 ; Ponomarenko2011 that are based on extending fidelity with visual system characteristics.
Daly Daly1992 introduced a visual model denoted as visual difference predictor (VDP), which is based on amplitude nonlinearity, contrast sensitivity, and a hierarchy of detection mechanisms. Through these mechanisms, VDP tries to measure visible differences that are caused by physical differences. Zhang and Li Zhang12 modeled suppression mechanisms by spectral residual (SR-SIM), which is calculated in the frequency domain. Damera et al. Venkata2000 proposed a degradation model denoted as NQM, which mimics the human visual system by considering contrast sensitivity, local luminance, contrast interaction between spatial frequencies, and contrast masking effects. Chandler and Hemami Chandler2007 formulated visual masking and summation through wavelet-based models to weight the SNR map. Other methods based on the frequency domain were also used to analyze human visual system properties including Sampat2009 ; NiL+08 ; NaL+12 . Zhang et al. developed FSIM, which mimics low-level feature perception through phase congruency and gradient magnitude. The majority of existing methods including FSIM measure quality using grayscale images or intensity channels. Intensity channels are usually preferred over chroma because human visual system is more sensitive to changes in intensity compared to color Lambrecht2001 . However, color channels still include information that is not part of intensity channels. FSIM was extended by introducing color information through pixel-wise fidelity over chroma channels. Temel and AlRegib utilized color information in the proposed methods PerSIM temel_15_persim and CSV temel_16_csv . To introduce color information into quality assessment, PerSIM uses pixel-wise fidelity whereas CSV utilizes color difference equations and color name distances.
The aforementioned quality estimators are based on handcrafting quality attributes. However, data-driven approaches can also be used to obtain quality estimators Tang2011 ; Mittal2012 ; Moorthy2010 ; Saad2012 ; Temel_UNIQUE ; Kang2014et al. Tang2011 proposed measuring quality through features based on natural image statistics, distortion texture statistics, and blur/noise statistics. These statistical features are mapped to quality scores by support vector regression. Mittal et al. Mittal2012 introduced BRISQUE, which is based on natural scene statistics in the spatial domain that are regressed to obtain quality estimates. Natural scene statistics-based methods were extended to frequency domain as in Moorthy2010 ; Saad2012 . Temel et al. Temel_UNIQUE introduced an unsupervised approach, in which a linear decoder architecture is used to obtain quality-aware sparse representations whereas Kang et al. Kang2014
proposed a supervised approach based on Convolutional Neural Networks.
In this manuscript, we follow an alternative approach by performing a frequency domain analysis of error representations. Qadri et al. QT+11 developed a full-reference method based on harmonic analysis for blockiness artifacts and a reduced-reference method based on harmonic gain and loss. On contrary to QT+11 , we focus on error images instead of compared images and our approach is not limited to blockiness artifacts and generalized to numerous distortions including compression artifact, image noise, color artifact, communication error, blur, global and local distortions.
3 Spectral Analysis of Error Representations
Spatial frequency masking is a characteristic that was observed in various biological visual systems AD81 . Albrecht and De Valois AD81 showed that striate cortex cells tuned to a spatial fundamental frequency respond to harmonics only if fundamental frequency component exists simultaneously. In addition to the visual system, the auditory system also possess similar masking characteristics. Alphei et al. AP+87 observed masking of temporal harmonics in the auditory cortex. Because changes in harmonics have the potential to interfere with the masking mechanisms, this interference can affect perceived quality. Based on the observations related to sensory systems, we hypothesize that changes in frequency domain characteristics can be correlated to changes in perception, specifically perceived quality. To test our hypothesis, we developed a full-reference image quality assessment algorithm and validated its perceived quality assessment capability in multiple databases. The core of the proposed method is based on the spectral analysis of error representations.
Van der Schaaf and Van Hateren VANDERSCHAAF1996 analyzed the power spectrum of natural images and showed that even though total power and its spatial frequency dependency vary considerably between images, they still follow a common characteristic for natural images. In Torralba2003 , Torralba and Oliva used the average power spectrum of images to extract information related to naturalness and openness of the images, semantic category of the scene, recognition of objects in the image, and depth of the scene. Since the power spectrum of the image depend on the context of the scene, directly extracting information related to quality may not be feasible. This is because changes in the context can affect the power spectrum more than the image quality in certain conditions. However, if we calculate the power spectrum of the error signal, we can limit the effect of context and focus more on measuring the degradations. To test the relationship between the magnitude spectrum of error and the level of distortion, we have conducted an experiment over 1,800 images in the TID 2013 database tid13 . We obtained the error images by taking the pixelwise difference between reference and compared images. In each distortion level, there are 600 images (25 x 24), which corresponds to 25 reference images distorted with 24 degradation types. We obtained the magnitude information by taking the DFT of the error images and calculating the log magnitude of the transformed images. Fourier images were shifted to display the low frequency components in the central region. Finally, we averaged the magnitude spectrum of the images corresponding to different challenge levels and quantize them to obtain the average magnitude spectrums in Fig. 1. We calculated the mean value of the spectrums and divided them by the mean value of level 1 spectrum to show the relative change in the mean values.
In these figures, the center of the image corresponds to low frequencies and corners correspond to high frequencies. Pixels are color coded according to the provided color legend based on the intensity levels of each spectrum component. The minimum distortion level is 1 and the maximum level is 5. As distortion level increases, degradations spread over the spectrum and intensify, which corresponds to an increase in the mean value of the spectrums. Therefore, the mean magnitude spectrum provides information related to the distortion level. The average spectrum analysis shows that the magnitude spectrum of error signals can be used to quantify degradations. As observed in Fig. 1, there is a spectral pattern followed by each distortion level when they are averaged over multiple images. To understand the structure of magnitude spectrums for individual natural images, we analyzed sample images from the TID 2013 database as shown in Fig. 2. As sample images, we used images of parrots with an out of focus natural scene in the background, a flower with a house in the background, and a sailboat with another sailboat behind in an ocean. As distortion, we included blur, quantization, and spatially correlated noise with a level of 1 (min), 3 (mid), and 5 (max). We show distorted images in one row and corresponding power spectrums in the following row. On the lower right side, we show the mean opinion scores for the distorted images and the mean values for the normalized magnitude spectrums.
In case of blur degradation, high frequency components are filtered out and images are smoother. As the degradation level increases, additional lower frequency components get filtered out and error becomes more centralized in the spectrum as observed in Fig. 2(d-f). There is a sharp horizontal line and a vertical line, which correspond to the regular patterns in the images. In the quantization degradation, pixels with similar color and texture characteristics converge to similar values because of the loss of details in the quantization stage. Therefore, the error spectrum is more concentrated as the degradation levels increase as observed in Fig. 2(j-l). In the noise degradation, pixels are corrupted with a spatially correlated degradation, which leads to local pointy distortions all over the images. Because of the spatial correlation, the magnitude spectrum is more symmetric and continuous as observed in Fig. 2(p-r). The shapes of the spectrums are different from each other and from the rhombus shape we obtained for the average spectrums. Therefore, it is not straightforward to pursue a shape-based measurement that is correlated with quality. However, we can pursue a global measurement to quantify the general behavior, which is the mean of the spectrum in this study. To obtain a distortion score, we calculate the mean of the log magnitude spectrum as
where is the error map, is the absolute value operator, is the logarithm, is the - discrete Fourier transform, and corresponds to the - mean pooling operation.
4 Multi-Scale and Multi-Channel Spectral Analysis
Spectral analysis of error signals is calculated over a single scale in the original resolution of the images. However, multi-scale representations and transforms can be considered as partial visual system models because neural responses in a visual cortex include scale-space orientation decomposition. Multiple scale and resolution approaches enable visual representations that can support different abstraction levels, which are commonly used in the image quality assessment literature Wang2003 ; Sampat2009 ; Wang2011 ; EA+06 ; PS+07 ; Ponomarenko2011 ; Zhang12 ; Venkata2000 ; Chandler2007 ; temel_15_persim ; Mittal2012 . To extend the single-scale baseline method to multi-scale, we perform a spectral analysis over multiple resolutions. Specifically, we downsample error maps by factors of , where is the scale index varying from to . The number of scales can be adjusted based on image characteristics. If the proposed quality assessment algorithm is used for higher resolution images, baseline scale or number of scales can be increased.
Color information is also overlooked in the baseline spectral analysis method. To naively utilize color information, we perform a multi-scale spectral analysis over each color channel in the RGB color space. In the RGB color space, color and intensity information is mixed. Therefore, we can utilize the same operator over each channel. However, in more perceptually correlated color spaces, color channels are more decorrelated and spectral analysis should not be performed over these channels in an identical fashion. Nevertheless, in the algorithm development process, we switched RGB with CIEXYZ, CIELa*b*, YCbCr, and HSV color spaces and they all underperformed.
When we measure quality solely based on error maps, quality estimation ranges and monotonic behaviors vary with the distortion type. These variations lead to misalignments that degrade the overall quality estimation performance. To eliminate these misalignments, we calculate frequency-based weights as
where is the reference image, is the compared image, is the - discrete Fourier transform, is the absolute value, is the logarithm, and corresponds to the - mean pooling. Instead of calculating frequency-based weights over all scales, we only compute them for the final two smallest scales. This is because frequency-based weights calculated over high resolution images are sensitive to minor changes that do not necessarily correspond to perceived degradations. We multiply the multi-scale weights to obtain a single weight (w), which is further multiplied with the the spectral analysis-based score (S) as shown in Fig. 3. We set the maximum objective score to five and divide it by the cube root of one plus the weighted score (wS) to obtain the final quality score. As the compared image gets similar to the reference image, final score converges to five.
5 Experimental Setup
In order to validate the performance of image quality estimators, we need to use comprehensive databases that include a high variety of distortion types. To satisfy this requirement, we utilize the TID 2013 (TID13) database tid13 , which is one of the most comprehensive image quality assessment databases with reference images in the literature in terms of distortion types. In addition to the TID13 database, we utilize the LIVE database Sheikh2006b , which is one of the most commonly used image quality databases. Even though TID13 and LIVE cover a wide range of distortion types, they do not include simultaneously applied distortion. To consider simultaneous distortions in the validation, we utilize also the LIVE Multiply Distorted (MULTI) database Jayaraman2012 . LIVE database experiments were conduced in an office environment with normal indoor illumination levels in which subjects viewed a 21 inch CRT monitor that displayed mostly pixel images from an approximate viewing distance of 2-2.5 screen height. The illumination conditions of the test protocol are not explicitly stated by the authors in Wang2004 . MULTI database experiments were performed in a workspace environment under normal illumination levels in which subjects viewed a monitor that displayed pixel images from an approximate distance of 4 times screen height. TID13 database experiments were conduced in laboratory conditions as well as through internet in which subjects were recommended to use a convenient distance to their monitors. We utilize databases with different setups because it is not possible restrict users in real life and we need to develop generic visual quality estimators that should operate in diverse platforms and conditions.
There are distortion types in the LIVE database, types in the MULTI database, and types in the TID13 database. Even though specific distortion types are different from each other, they can be grouped into common categories according to their high-level characteristics. In this study, individual distortion types are grouped into main categories as follows: The Compression category includes JPEG, JPEG 2000, and lossy compression of noisy images. The Noise
category includes additive Gaussian noise, additive noise in chroma channels, impulse noise, spatially correlated noise, masked noise, high frequency noise, quantization noise, image denoising artifacts, multiplicative Gaussian noise, comfort noise, lossy compression of noisy images and white noise. TheCommunication category includes Rayleigh fast-fading channel error, JPEG and JPEG2000 transmission errors. The Blur category includes Gaussian blur and sparse sampling and reconstruction error. The Color category contains color saturation change and color quantization with dither and chromatic aberrations. The Global category includes intensity shift and contrast change. The Local category includes non-eccentricity pattern and local block-wise distortion of different intensity. The number of images in each category is summarized in Table 1.
5.2 Validation Setup
We utilize the Pearson linear correlation coefficient (PLCC) to measure linearity, Spearman rank-order correlation coefficient (SRCC) and the Kendall rank-order correlation coefficient (KRCC) to measure monotonicity, root mean squared error (RMSE) to measure accuracy, and outlier ratio (OR) to measure consistency of quality estimatesITU-T ; Kendal1945 . We regress quality scores before computing validation metrics as in Sheikh2006b , which can formulated as
where is the objective score, is the regressed objective score, and s are the parameters that are tuned based on the relationship between objective and subjective scores. We utilized the fitnlm function in MATLAB and initialized regression coefficients to [0.0, 0.1, 0.0, 0.0, 0.0], from which a nonlinear model started its search for optimal coefficients. Reported performances of existing methods can vary from the literature because of the differences in regression curves and initialization coefficients.
We use statistical tests suggested in ITU-T Rec.P.1401 ITU-T to evaluate the significance of difference between correlation coefficients. There are two main hypothesis in a statistical significance test. The first one () claims that there is no significant difference between compared correlation coefficients and the second one ( ) claims that there is a significant difference between compared correlation coefficients. In order to verify whether is true or not, at first, we assume thatITU-T . If the significance value is below the two-tailed t-distribution value, is true, otherwise is true. We use the tabulated t-distribution values for the significance level of the two tailed test.
To analyze the distribution of subjective scores versus objective scores of best performing quality estimators, we provide scatter plots, whose x-axis corresponds to quality estimates and whose y-axis corresponds to mean opinion scores (MOS) or differential mean opinion scores (DMOS). An ideal quality estimator leads to a scatter plot in which scores should be located on a linear curve. Moreover, to further analyze the difference between subjective and objective scores, we calculate the difference between normalized histograms of subjective scores and regressed quality estimates as in temel_16_csv . We utilize the common histogram differences metrics including Earth Mover’s Distance (EMD), Kullback-Leibler (KL) divergence, Jensen-Shannon (JS) divergence, histogram intersection (HI), and norm.
|Average Computation Time per Image|
|Outlier Ratio (OR)|
|Root Mean Square Error (RMSE)|
|Pearson Linear Correlation Coefficient (PLCC)|
|Spearman’s Rank Correlation Coefficient (SRCC)|
|Kendall’s Rank Correlation Coefficient (KRCC)|
[Sources Codes] PerSIM, CSV, UNIQUE, COHERENSI, SUMMER: https://ghassanalregib.com/publications/, PSNR-HA,PSNR-HMA: http://www.ponomarenko.info/psnrhma.m, SSIM, MS-SSIM, BRISQUE, BIQI, BLIINDS2: http://live.ece.utexas.edu/research/quality/index.htm , CW-SSIM: https://www.mathworks.com/matlabcentral/fileexchange/43017-complex-wavelet-structural-similarity-index-cw-ssim, IW-SSIM: https://ece.uwaterloo.ca/~z70wang/research/iwssim/, SR-SIM: https://github.com/Netflix/vmaf/blob/master/matlab/strred/SR_SIM.m, FSIM,FSIMc: http://www4.comp.polyu.edu.hk/~cslzhang/IQA/FSIM/FSIM.htm.
We report the overall performance in Section 6.1 and distortion-based performance in Section 6.2. In Section 6.3, we analyze the distributional difference between subjective and objective scores as well as their scatter plot characteristics. We analyze the classification performance of top quality estimators in Section 6.4. Finally, in Section 6.5, we report the average computation time of quality estimators and discuss possible approaches to accelerate the execution.
The performance of quality estimators including SUMMER over three databases is summarized in Table 2. We highlighted top two methods with a bold typeset and a light blue background. In case of performance equivalence, we include all equivalent methods. Out of total categories, highlighted methods include SUMMER in categories, SR-SIM and UNIQUE in categories, CSV in categories, PerSIM and PSNR-HA in category. We also measure the statistical significance of the difference between the performance of SUMMER and benchmarked methods in terms of correlation. We report the results of these statistical significance tests under correlation values of existing methods. A corresponds to statistically similar performance, implies that compared method is statistically inferior, and means that compared method is statistically superior. SUMMER statistically outperforms all the quality estimators in at least two categories and none of these methods statistically outperform SUMMER in any category. We observe that SUMMER statistically outperforms COHERENSI in all correlation categories.
To illustrate the relative performance of image quality estimators, we computed weighted averages of their performance in terms of Pearson and Spearman correlations as shown in Fig.4 in which the x-axis corresponds to the Pearson correlation and the y-axis corresponds to the Spearman correlation. Weighted averages were obtained by calculating the estimated quality in each database and weighing database performance values with the number of images in each database divided by total number of images in the validation. It was not possible to distinguish markers clearly when all quality estimators were shown in a single scatter plot. Therefore, we separated them into two scatter plots as upper quadrant and lower quadrant. Fig.4(a) includes PSNR, SSIM, CW-SSIM, BRISQUE, BIQI, BLIINDS2, and COHERENSI whereas Fig.4(b) includes PSNR-HA, PSNR-HMA, MS-SSIM, IW-SSIM, SR-SIM, FSIM, FSIMc, PerSIM, CSV, UNIQUE, and SUMMER. COHERENSI is close to the center of the lower quadrant whereas SUMMER in on the top right of the higher quadrant.
6.2 Distortion Categories
|Outlier Ratio (OR)|
|Root Mean Square Error (RMSE)|
|Pearson Linear Correlation Coefficient (PLCC)|
|Spearman’s Rank Correlation Coefficient (SRCC)|
|Kendall’s Rank Correlation Coefficient (KRCC)|
We report the distortion category-based performance of image quality estimators in Table 3. Distortion-based algorithmic performances were obtained from weighted averages of performances over different distortions in which weights were proportional to the number of images in each category. SUMMER is the best performing method in terms of all performance metrics in noise category. It is also the best method in all other categories other than color and local distortion in terms of at least one performance metric. There are main distortion categories and performance metrics. When we analyze these main distortion categories and performance metrics, there are main categories. In each category, we highlighted top two methods with a bold typeset and a light blue background. In case of performance equivalence, we include all equivalent methods. Out of total categories, highlighted methods include SUMMER in categories, SR-SIM in categories, UNIQUE in categories, CSV and PSNR-HA in categories, PerSIM and PSNR-HMA in categories, COHERENSI in categories, FSIMc, FSIM, SSIM, and PSNR category.
|Earth Mover’s Distance (EMD)|
|Kullback-Leibler Divergence (KL)|
|Jensen-Shannon Divergence (JS)|
|Histogram Intersection (HI)|
|L2 Norm (L2)|
6.3 Distributional Difference and Scatter Plots
To analyze the difference between subjective and objective scores, we calculated the difference between normalized histograms of subjective scores and quality estimates as reported in Table 4. Distribution of subjective scores is most similar to SUMMER and CSV in the LIVE and the TID13 databases and to UNIQUE and CSV in the MULTI database. To further analyze the relationship between subjective and objective scores, we selected quality estimators that are highlighted in at least two different databases in Table 2, which include IW-SSIM, SR-SIM, CSV, UNIQUE, and SUMMER. We show the scatter plots of top quality estimators in Fig. 5. In the provided scatter plots, the x-axis corresponds to the quality estimates and the y-axis corresponds to the mean opinion scores (MOS) or differential mean opinion scores (DMOS). We plot the mapping function that is learned by the regression formulation as a red curve in the scatter plots. Moreover, we also plot two curves that are one standard deviation away with dashed lines and two curves that are two standard deviation away with dotted lines. An ideal quality estimator should be located on a linear curve with low deviation. In the LIVE database, IW-SSIM and SR-SIM have a steeper decrease close to maximum quality score. UNIQUE scores are spread out and they decrease more linearly whereas SUMMER and CSV decrease monotonically. In the MULTI database, all methods follow a monotonically decreasing behavior. Even though certain algorithms including UNIQUE and CSV cover majority of objective scores between their minimum and maximum values continuously, we can observe that SUMMER scores are more clustered around certain values. In the TID13 database, IW-SSIM and SR-SIM follow a monotonically increasing behavior along with a steeper increase close to maximum quality score. SUMMER also monotonically increases but its steep increase is around high scores rather than max score. CSV and UNIQUE follow a relatively linear behavior compared to other methods. CSV has a limited quality score range utilization whereas UNIQUE is spread all over its score range.
6.4 Classification Performance
Previously, we tested the estimation performance of the image quality assessment algorithms. In this section, we test the classification performance of these assessment algorithms by utilizing the techniques introduced in Krasula2016
. The first analysis measures the capability of quality assessment algorithms to distinguish statistically different and similar pairs (Different versus Similar test). Absolute difference of the predicted scores should be larger for significantly different image pairs to achieve a high performance. We report the performance in terms of the area under Receiver Operating Characteristics Curve (). The second analysis is performed on the significantly different pairs to measure the capability of the algorithms to identify the higher quality image and the lower quality image (Better versus Worse test). In addition to , we also report the result for the second analysis in terms of classification percentage (). We also provide statistical significance results corresponding to all reported metrics. In Fig. 6, we report the and values of five top-performing quality estimators in the top row and statistical significance comparison in the bottom row. These results correspond to performances over all the databases combined. We observe that proposed method SUMMER significantly outperforms compared methods in almost all of the categories other than UNIQUE in the category.
6.5 Computation Time
We measured the time required to obtain objective quality scores of all the images in the validation databases and computed the average processing time per image. In our analysis, we do not include the quality estimators that require an off-line training process. The computer used for these measurements has a 3.50 GHz Intel(R) Core(TM) i7-3770K CPU and a 32 GB RAM. The average time per image for PSNR, SSIM, MS-SSIM, SR-SIM, and COHERENSI are all less than or equal to 0.05 seconds, which is followed by SUMMER with seconds. The average time required to obtain a quality score with other methods varies between to times of the time required by SUMMER. In its current implementation, spectral analysis and frequency-based weight extraction are performed over each color channel and scale sequentially. Therefore, we can reduce the computation time with a more efficient implementation that supports parallel processing of color channels and scales.
We analyzed the magnitude spectrum of error signals and extended this analysis with color channel utilization, multi-resolution representation, and frequency-based weight extraction to obtain the quality assessment algorithm SUMMER. Based on our experiments, the proposed algorithm significantly outperforms the majority of compared image quality assessment algorithms. As shown in the validation, the relationship between objective and subjective scores is monotonic rather than linear. Therefore, to utilize SUMMER in practice for any kind of stimuli without regression, we need to enhance the algorithm to provide higher linearity and lower deviation. Color channel utilization contributed to the performance enhancement of SUMMER but we need to design bio-inspired algorithms that rely on visual system characteristics rather than solely depending on color channel values. In this study, we utilized the mean value of magnitude spectrums to estimate the objective quality. However, in addition to global statistics, we need to investigate the shape-based characteristics of error spectrums. With the proposed framework, a shape-based spectral signature can be obtained to not only estimate the quality but also to identify the distortion types.
- (1) S. Westen, R. Lagendijk, J. Biemond, Perceptual image quality based on a multiple channel HVS model, in: International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 1995.
- (2) T. Hegazy, G. AlRegib, COHERENSI: A new full-reference IQA index using error spectrum chaos, in: IEEE Global Conference on Signal and Information Processing, 2014, pp. 965–969.
- (3) Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity., IEEE Transactions on Image Processing 13 (4) (2004) 600–12.
- (4) Z. Wang, E. P. Simoncelli, A. C. Bovik, Multi-scale structural similarity for image quality assessment, the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers 2 (2004) 9–13.
M. Sampat, Z. Wang, S. Gupta, A. Bovik, M. Markey, Complex wavelet structural similarity: A new image similarity index, Image Processing, IEEE Transactions on 18 (11) (2009) 2385–2401.
- (6) Z. Wang, Q. Li, Information content weighting for perceptual image quality assessment., IEEE Transactions on Image Processing 20 (5) (2011) 1185–98.
- (7) K. Egiazarian, J. Astola, N. Ponomarenko, V. Lukin, F. Battisti, M. Carli, New full-reference quality metrics based on HVS, in: International Workshop on Video Processing and Quality Metrics, Scottsdale, AZ, USA, 2006.
- (8) N. Ponomarenko, F. Silvestri, K. Egiazarian, M. Carli, J. Astola, V. Lukin, On between-coefficient contrast masking of DCT basis functions, in: International Workshop on Video Processing and Quality Metrics, Scottdale, AZ, USA, 2007.
- (9) N. Ponomarenko, O. Eremeev, L. V., K. Egiazarian, M. Carli, Modified image visual quality metrics for contrast change and mean shift accounting.
- (10) S. Daly, Digital images and human vision, MIT Press, Cambridge, MA, USA, 1993, Ch. The Visible Differences Predictor: An Algorithm for the Assessment of Image Fidelity, pp. 179–206.
- (11) L. Zhang, H. Li, SR-SIM: A Fast and high performance IQA index based on spectral residual, in: Image Processing (ICIP), 2012 19th IEEE International Conference on, 2012, pp. 1473–1476.
- (12) N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, A. C. Bovik, Image quality assessment based on a degradation model., IEEE Transactions on Image Processing 9 (4) (2000) 636–50.
- (13) D. M. Chandler, S. S. Hemami, VSNR: A Wavelet-based visual signal-to-noise ratio for natural images., IEEE Transactions on Image Processing 16 (9) (2007) 2284–98.
- (14) A. Ninassi, O. Le Meur, P. Le Callet, D. Barba, On the performance of human visual system based image quality assessment metric using wavelet domain, in: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 6806, 2008.
- (15) M. Narwaria, W. Lin, I. V. McLoughlin, S. Emmanuel, L. T. Chia, Fourier transform-based scalable image quality measure, IEEE Transactions on Image Processing 21 (8) (2012) 3364–3377.
- (16) C. J. V. Lambrecht, Vision models and applications to image and video processing , Kluwer Academic Publishers, 2001.
- (17) D. Temel, G. AlRegib, PerSIM: Multi-resolution image quality assessment in the perceptually uniform color domain, in: IEEE International Conference on Image Processing, 2015, pp. 1682–1686.
- (18) D. Temel, G. AlRegib, CSV: Image quality assessment based on color, structure and visual system , Signal Processing: Image Communication 48 (2016) 92 – 103.
- (20) A. Mittal, A. K. Moorthy, A. C. Bovik, No-reference image quality assessment in the spatial domain., IEEE Transactions on Image Processing 21 (12) (2012) 4695–708.
- (21) A. K. Moorthy, A. C. Bovik, A two-step framework for constructing blind image quality indices, IEEE Signal Processing Letters 17 (5) (2010) 513–516.
- (22) M. A. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A Natural scene statistics approach in the DCT domain, IEEE Transactions on Image Processing 21 (8) (2012) 3339–3352.
- (23) D. Temel, M. Prabhushankar, G. AlRegib, UNIQUE: Unsupervised image quality estimation, IEEE Signal Processing Letters 23 (10) (2016) 1414–1418.
- (24) L. Kang, P. Ye, Y. Li, D. Doermann, Convolutional neural networks for no-reference image quality assessment, IEEE Conference on Computer Vision and Pattern Recognition (2014) 1733– 1740.
- (25) M. Qadri, K. Tan, M. Ghanbari, The impact of spatial masking in image quality meters, Global Journal of Computer Science and Technology 11 (2011) 69–75.
- (26) D. Albrecht, R. De Valois, Striate cortex responses to periodic patterns with and without the fundamental harmonics, Journal of Physiology 319 (1981) 497–514.
- (27) H. Alphei, D. Püschel, A. Kohlrausch, Temporal and spectral masking effects of harmonic complex tones, in: Audio Engineering Society Convention 82, 1987.
- (28) A. van der Schaaf, J. van Hateren, Modelling the power spectra of natural images: Statistics and information, Vision Research 36 (17) (1996) 2759 – 2770.
- (29) A. Torralba, A. Oliva, Statistics of natural image categories, Network: Computation in Neural Systems 14 (3) (2003) 391–412, pMID: 12938764.
- (30) N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, C.-C. J. Kuo, Image database TID2013: Peculiarities, results and perspectives , Signal Processing: Image Communication 30 (2015) 57 – 77.
- (31) H. R. Sheikh, M. F. Sabir, A. C. Bovik, A statistical evaluation of recent full reference image quality assessment algorithms, IEEE Transactions on Image Processing 15 (11) (2006) 3440–3451.
- (32) D. Jayaraman, A. Mittal, A. K. Moorthy, A. C. Bovik, Objective quality assessment of multiply distorted images, in: Asilomar Conference on Signals, Systems and Computers, 2012, pp. 1693–1697.
- (33) Telecommunication Standardization Sector of International Telecommunication Union (ITU-T), Methods, metrics and procedures for statistical evalution, qualification and comparison of objective quality prediction models.
- (34) M. Kendall, The advanced theory of statistics, in: Charles Griffin & Company Limited, London, UK, 1945.
- (35) L. Zhang, L. Zhang, X. Mou, D. Zhang, FSIM: A Feature similarity index for image quality qssessment., IEEE Transactions on Image Processing 20 (8) (2011) 2378–86.
- (36) L. Krasula, K. Fliegel, P. L. Callet, M. Klíma, On the accuracy of objective image and video quality models: New methodology for performance evaluation, in: International Conference on Quality of Multimedia Experience, 2016, pp. 1–6.