The interest on the quality assessment of images grows with the volume of visual communications and the advance of imaging technologies, pushing the development of effective techniques for the prediction of the average image quality judged by the human users, hereinafter referred to as the subjective quality.
The subjective quality is measured by averaging the scores assigned by a panel of human observers following specific protocols and is usually rated on the so-called Mean Opinion Score (MOS) scale, or on the Differential MOS (DMOS) scale, defined as the MOS difference between the reference (original) and the test (impaired) images.
However, measurement of subjective quality is unpractical for routine and large scale image quality assessment (IQA). Therefore, the subjective quality is usually predicted by means of objective IQA algorithms, founded on abstract models of the human observer.
IQA methods operate in three basic modes. In the Full Reference (FR) mode they quantify the differences between the reference and test images. In the Reduced Reference (RR) mode, the comparison is limited to partial representations of the images. Finally, in the No Reference (NR) mode, selected features extracted from the test image alone are compared to ideal targets[1, 2, 3].
Surveys of several IQA methods are available in literature (e.g., ). Here, to place the presented method in a precise conceptual frame, a concise classification of the IQA methods is provided on the basis of the underlying modeling of the human observer.
Psycho-physical models quantify the differences between original and test images by emulating the signal processing of the early stages of the e Human Visual System (HVS) [5, 6], supposed typical for all subjects. The generalization capability of these methods is limited by the influence of higher-level mental factors, such as emotion, education, past experience, etc. [7, 8, 9].
Cognitive models  look at the image quality scoring as a computational process taking place in the human mind  and assume that the subjective quality degradation is strongly correlated with some measure of visual information loss. The Peak Signal to Noise Ratio (PSNR) method is perhaps the simplest cognitive method, which identifies the information loss with a reduction of the signal to noise ratio (SNR), where the difference between the original and the test images is interpreted as noise.
The Structural Similarity (SSIM) method measures the so-called structural information, defined through the local normalized Mean Square Error (MSE) between the test and the reference images . Variants of the SSIM include Multi-Scale analysis (MS-SSIM)  or image gradient analysis, adopted in the Feature Similarity (FSIM)  and in the Gradient Magnitude Similarity Deviation (GMSD)  methods.
The Visual Information (VIF) method measures the loss of Shannon mutual information , while the Virtual Cognitive Method (VICOM) measures the loss of the Fisher information about the localization of patterns .
The generalization capability of cognitive models depends upon the coverage of adopted information measure. For instance, the PSNR is sensitive to any kind of impairment, whereas the VIF  is specialized for impairments modeled by linear distortion plus additive noise.
are automatic learning machines trained by samples to give DMOS estimates directly from the images[20, 21] or from a set of selected features . If behavioral models are applied for known impairments, they may assume a very simple form [23, 24, 25], but in general they must include an automatic classification of the type of impairment. In these cases, the generalization is a critical issue. Neural nets can tightly fit the DMOS for specific image data sets using many adjustable weights, but the effects of different impairment factors are hard to predict.
I-a Shortcomings of existing methods
One typical problem of most FR IQA methods is their unequal sensitivity with respect to the covered impairments [16, 17]. To compensate for this drawback, some methods combine values of the same metric calculated at different resolutions 
, relying on the fact that blur effects vanish when resolution shrinks, whereas noise and artifacts are only attenuated. Another compensation technique exploits the inhomogeneous spatial distribution of errors generated by smoothing and additive effects, measured by the variance of the local MSE.
A more systematic solution to this problem lies in the combination of different metrics specialized for different impairments. This approach was proposed in , where two metrics were used to measure the amounts of blur and additive noise. This objective was also pursued in  for video. In  a categorical index was paired to the positional Fisher information. Recently, a similar approach was followed in .
Another issue overlooked by most existing IQA methods is that the relationship between metrics and the empirical DMOS scale is strongly non-linear [17, 28]. As a matter of fact, the Spearman Rank Order Correlation Coefficient (SROCC) , often employed to compare the performance of different metrics, is insensitive to linearity issues. On the other hand, the linearity of the DMOS estimates is essential in applications, since the quality must be quantified at the end on the DMOS scale.
To circumvent this problem, it is customary to linearize the metrics versus the empirical DMOS scale using ad hoc non-linear parametric (logistic) functions , as suggested also by the Video Quality Expert Group . However, unequal sensitivity of metrics to different impairments cannot be compensated by linearization and produces irreducible DMOS fitting errors.
Moreover, the linearization involves the non-linear optimization of several parameters, that critically depend on the empirical data set. The generalization capability across different data sets degrades linearly with the number of adjustable parameters [32, 33], regardless of the training procedure , and may lead to under-fitting or over-fitting effects, strong statistical scattering of sample fitting parameters and local minima issues.
I-B The proposed approach
The objective of the method presented in this paper is to solve the problems of unequal sensitivity to different impairments and of the a posteriori parametric linearization. Generally speaking, the approach followed here consists of the combination of different metrics, tailored to different impairments. Specifically, it stems from the consideration that most existing IQA methods treat image detail loss and spurious details in the same way. In other words, they do not distinguish between impairments caused by the loss of visual structures (depriving errors) or by the appearance of artifacts (meddling errors
). However, this is at odds with common evidence, since detail losses and spurious details have a very different visual appearance.
Therefore, it is reasonable to measure their effects on subjective quality using different metrics. Previous attempts in this direction appeared in [37, 38], where point-wise increments and decrements of the gradient magnitude were discriminated. In  detail losses and additive impairments were separated using a restored version of the test image as a watershed.
Herein, this discrimination is operated through a Least Squares (LS) decomposition of the gradient field of the test image into two components: a component linearly predicted from the gradient field of the original image, and the residual, unpredictable gradient, The detail loss is then identified by the attenuation of the predicted gradient into a small observation window. Likewise, the presence of spurious detail is identified by the gradient residual observed into a small window. This modeling is suggested by the orthogonality of the residual gradient with respect to the predicted gradients .
The second step is the definition of two metrics, representing the perceptual impact of detail losses and spurious details respectively, each one as linear as possible with respect to the empirical DMOS scores. To this purpose, the contributions to the overall perceptual impact coming from detail losses are quantified by the loss of the detail positional information [41, 17], proportional to the square root of the gradient energy attenuation into a detail window. The perceptual impact of spurious details is quantified by a logarithmic measure of the ratio between the original gradient energy and the residual gradient energy, pooled over the image.
The metrics are considered as the coordinates of a two-dimensional (-D) space following the VICOM  scheme. The VICOM is structured as a set of computational layers, starting from the extraction of gradients from the reference and the test images, up to the computation of multiple features that form the coordinates of the virtual cognitive state space. Each point of this space is then mapped to a corresponding DMOS estimate by a parametric function trained on experimental data . For this reason, the method presented here is referred to as Detail VICOM (D-VICOM). As a reference for the reader, a block scheme of the D-VICOM is displayed later in Fig. 6.
At this point, the visual diversity of detail losses and spurious details is again invoked to argue that their individual perceptual impacts will contribute independently to determine the overall subjective quality loss. It follows that the mapping function from the state space to the DMOS estimate boils down to a simple affine transformation.
The effectiveness of this new model was preliminarily tested on the LIVE DBR2  database, where the method exhibited superior performance with respect to competing methods. Then, it was verified on two other independent databases (TID2008  and CSIQ ) for the classes of impairment common to the three databases.
The outstanding fitting and cross-fitting of the D-VICOM estimates up to an affine transformation of the DMOS scale reveal that these heterogeneous experimental data are highly consistent under the D-VICOM paradigm. This consistency allowed to align and fuse these data into a unique large database, called the SUPERQUARTET.
Rather unexpectedly, it is also concluded that it is possible to define a database independent D-VICOM (ID-VICOM) quality estimator, specifying only conventional DMOS values of an original image and of an its noisy version.
This paper is organized as follows. In Sect. II the decomposition of the gradient of the test image is described, and the results are illustrated by means of visual examples. In Sect. III the metrics are defined and their affine combination is proposed for DMOS estimation.
In Sect. IV the DMOS estimates are statistically tested using the LIVE DBR2  and then verified with the data contained in the TID2008  and the CSIQ . The performance of the linear D-VICOM and of some competing methods after logistic linearization is compared on the SUPERQUARTET. In Sect. V it is shown that the D-VICOM can be quickly calibrated without training. In Sect. VI the computational costs are discussed and in Sect. VII the merits of the D-VICOM are finally underlined.
Ii Detail analysis
The detail analysis is performed on the luminance components of an impaired test image and of the reference image, that are gray-scale functions of the generic pixel position .
Image gradients were theoretically supported as a relevant feature for IQA in  and adopted by recent methods . For the proposed analysis, it is convenient to consider the Gaussian smoothed complex gradients and , extracted by the following operator of scale and unit energy 
where denotes -D convolution and is the principal phase angle of the complex argument. For pixel or slightly more, the frequency response of (1) well approximates the Contrast Sensitivity Function (CSF) of the HVS front end [39, 45], at nominal viewing distance . The operator (1) summarizes the horizontal and vertical filters commonly used for gradient approximation  and is steerable, i.e., has the same frequency response for any pattern orientation .
The test image gradient is decomposed for any as the sum of a linearly distorted (smoothed) version of the original gradient (identified as the distorted detail) and of an unpredictable gradient error (considered as a spurious detail):
The distorted is further modeled by a linear combination of the original gradient and of a pair of its directionally filtered versions
where , and are real valued coefficients depending on the position , and and are calculated as
The impulse responses and are second order normalized Gaussian derivatives (i.e., Hermite-Gauss) functions of the same scale as , defined as
The coefficients for are identified by a regularized LS system , which minimizes the local error energy
within a spot centered on the analyzed point . To minimize the interference from adjacent points, the squared errors are weighted by a Gaussian window of spread , scaled so that . In particular, the smallest pixel compatible with the number of estimated parameters within the spot was adopted 111For the effective spot has a diameter larger than four pixels. So more than effective real valued equations are available for each to estimate the three real plus the local error variance..
The overall cost function is
where the penalty term was chosen for gray-level images to regularize  the sample if the original gradient magnitude is small, without significantly biasing the LS parameter estimate. The LS solution is computationally efficient and its statistical properties are well known [40, 48] (see also the Appendix).
To put into evidence how the proposed decomposition is actually correlated with the visual findings, in Fig. 1 the gradient attenuation map
an indicator of detail loss222The constant was inserted for display regularization purposes., and the residual gradient magnitude map , an indicator of the spurious detail presence, are displayed for blurred, noisy, JPEG and JPEG2000 compressed images.
Maps are enhanced to reveal details well below the visibility threshold. The noisy image is characterized by random gradient residuals and, conversely, the blurred image is characterized by diffuse gradient attenuation and negligible residuals. The coded images, that are affected by both lost and spurious details, exhibit mostly complementary patterns of gradient attenuation and gradient residuals.
Iii-a Detail loss metric
The perceptual impact of detail loss was not directly analyzed in the past. The closest problem considered in literature was the FR rating of noiseless images blurred by optical devices, characterized by their Modulation Transfer Function [49, 50, 51].
In this work the impact of the detail loss on the subjective quality rating is analyzed from a different viewpoint. In  it was stressed that the subjective degradation of an image should be strictly correlated with the accuracy of pattern localization, since it plays an essential role for distance estimation in binocular vision. Since the HVS perception of spatial displacements is basically linear , it is argued that the subjective degradation should be proportional to the increase of the position uncertainty of the observed distorted patterns with respect to the original ones.
Now, accepting that the performance of vision mechanisms is near optimal, it follows that the visual localization accuracy of a detail is measured by the Fisher information (FI) about its position. In [41, 17] the positional FI of a portion of image extracted by a window centered on
, in the presence of additive Gaussian white noise of variance, was calculated as
where is the true image gradient.
It is deduced that the positional accuracies of the original and test images, respectively characterized by the gradients and , are given by
for the same noise variance .
Using the smoothed gradients, the loss of positional accuracy due to detail loss only is estimated as
and the coefficient , valid for , takes into account the worst case spurious noise prediction in (7), as described in Appendix. The same window is used for (7), (9) and (10), so that the effective area interested for detail loss evaluation is about four times the prediction spot, due to the convolutional spread.
In (11) is clipped within the admissible interval to increase its robustness to estimation errors.
However, expression (11) has to be refined considering that:
the FI (8) is defined using the ideal gradient, which amplifies the spectral components of the image proportionally to the spatial frequency. However, the magnitude response of the smoothed gradient operator (1) decays at high spatial frequencies. Since the detail loss is mostly characterized by the attenuation of high frequency components, the gradient loss is actually under-estimated by the smoothed gradients (12) and (13). Therefore, the FI calculated from the smoothed gradient is under-estimated in spots characterized by a rich high frequency content. In this work, the compensation of the FI estimate was obtained in a simple way by raising the magnitude of (12) and (13) to an exponent . The optimal value of is hard to predict and must be empirically identified. Empirical DMOS values of the three independent databases employed in this work are well explained by setting (see Fig. 4);
the pooling should be extended only to the set of points where the gradient prediction by (7) is reliable. In particular, the spots characterized by severe ill-conditioning due to unbalanced energy of the LS equations 
are excised from the pooling set. In particular, the spots almost exactly centered on strong edges are discarded, according to the heuristic rule
To enhance the independence between metrics, full weight should be given only to the points where spurious details are negligible, indicating that the model (4) is really accurate, while the weight should be reduced everywhere detail loss overlaps with spurious details.
Then, the proposed estimate of the perceptual impact due to details loss only is
where is a small regularizing constant, and
is a non critical weighting factor. Finally, the DMOS component attributed to pure detail loss is estimated as
Iii-B Spurious detail metric
The spurious details are substantially meaningless for the observer, so that their localization accuracy should not be relevant as in the case of detail loss. It is rather plausible that the annoyance produced by spurious details is similar to that caused by random noise. Since for images affected by additive noise the DMOS is quite well estimated by the PSNR logarithmic index, it is assumed that the perceptual impact caused by spurious details is logarithmically related to the ratio between the average energy of the original details and the average energy of the lost details, defined as
Imposing that the subjective quality estimate is zero for diverging noise, let us define the quantity
where the constant accounts for gradient noise variance which starts to impair the quality for white noise corrupted images , and the positive constant sets the working point of the logarithmic law. It has been found that the empirical DMOS values are well explained by the non-critical value (see Fig. 4).
Finally, the proposed estimate of the perceptual impact attributed to spurious details only is calculated as333It is worth noting that is always finite.
where is imposed for identical test and reference images. Thus the estimate of the DMOS component attributed to spurious details is
Iii-C Metric combination
Following the VICOM scheme , the overall DMOS prediction, referred to as , is calculated by combining and through a general two-dimensional parametric function
is the fitting coefficient vector.
The basic assumption of this paper is that detail losses and spurious details are distinctly perceived, as evident by common experience. For a Taylor series expansion of (24) around any point , this hypothesis is expressed by
for , leading to the decoupled form
In addition, if the marginal metrics and are affine functions of and , respectively, the approximation (24) boils down to
where the constant accounts for possible non-zero DMOS scores assigned to the original images during experimental sessions, while and compensate for the different DMOS sensitivity with respect to and . The last point will be discussed in detail in Sect. IV-E.
The form of (27) is deemed valid for impairments coming from a mixture of detail loss and spurious detail addition over small spots and distributed all over the test image, as it happens for instance in coding and error correction applications. Other impairments, such as luminance and contrast changes, chromatic aberrations, strong and isolated artifacts and low-frequency, correlated noise, are not directly covered by the present D-VICOM model.
The D-VICOM decomposition scheme is loosely related to the Most Apparent Distortion (MAD) scheme, based on a HVS model . The basic difference is that the MAD uses two distinct metrics for high and low quality regions (instead of lost and spurious details), according to a visibility map.
Iv Experiments and results
The linearity and the uniformity of the scatterplot of the IQA metric versus the empirical DMOS (see, for instance, Fig. 3 in the sequel) of different databases is the ultimate requirement in applications. To achieve this goal, IQA metrics generally need a linearization through a parametric function . The linearized scatterplot should be tightly concentrated around the diagonal.
MSE and LCC are good indicators of the scatterplot concentration 29] of the empirical DMOS have been properly excised and the fitting parameter estimates are not critically influenced by a specific image content . On the other hand, the proper training of the D-VICOM (27) requires the presence in the data set of a class of mixtures covering from essentially spurious detail addition to essentially lost details.
For these reasons, the statistical analysis of the proposed DMOS estimator was performed on large image databases, containing test images affected by different impairments. Specifically, the LIVE DBR2 , the TID2008  and the CSIQ  databases were chosen. These independent databases contain annotated DMOS values obtained through different experimental settings and share four common impairment classes: additive white noise, Gaussian blur, JPEG and JPEG2000 coding.
The statistical performance of D-VICOM and other competing estimators is illustrated in the sequel, listing the number of the adjustable parameters of the DMOS fitting function. Except for the classical VICOM , linearization was performed by a five parameter monotonic logistic function [30, 31], trained by the BRLS modified Newton algorithm  over several runs, to minimize the risk of trapping into local minima.
Ordinary LS fitting was used to directly optimize the RMSE. Since we are mainly interested in the linearization parameters, it is worth noting that ordinary LS fitting coefficients are unbiased under the reasonable assumptions of zero mean empirical DMOS distribution, even in the case of different variance across test images .
All metrics were computed by on-line available software [42, 54, 55, 44, 56]. A preliminary statistical analysis was performed on the full LIVE DBR2 , which includes impairment classes  all covered by the D-VICOM model. Then, the statistical analysis was focused on the classes of impairment common to all three databases.
Iv-a LIVE DBR2 analysis
Following the VICOM scheme , the virtual cognitive state of each distorted image is defined by the pair . The two-dimensional mapping of the cognitive states constitutes the cognitive chart, depicted in Fig. 2 for the LIVE DBR2 . The relationship between the cognitive state associated to each test image and the DMOS estimated by (28) can be read on the superimposed iso-metric DMOS lines, which are rectilinear. They provide an immediate visualization of the quality level of the whole data set.
As expected, blurred images, mostly characterized by lost details, and noisy images, mostly characterized by spurious details, cluster in the proximity of the coordinate axes. This empirical verification supports the fundamental assumption made in this paper about the distinct perception of detail loss and spurious details.
Nevertheless, at very high noise levels, gray-scale saturation induces detail loss, and, for very strong blur, halos generate spurious details. A nice surprise is that the metrics and remain admirably complementary, even in these extreme regions.
JPEG and JPEG2000 images cluster into distinct chart regions, pointing out the different management of detail losses and artifacts operated by these compression techniques . Fast fading states reflect a prevalence of spurious details for light impairments and of detail loss for strong impairments.
In Fig. 3 the scatterplot of the D-VICOM is compared to the scatterplot of the seemingly closest uni-dimensional competitor, i.e., the MAD estimator , before and after logistic linearization. The linearity of the D-VICOM is comparable to the linearity of the MAD after logistic transformation. The DMOS scale goes beyond the conventional upper limit of , because the LIVE DBR2 scores were realigned among different impairments [30, 42].
Fig. 4 puts into evidence the negligible marginal sensitivity of the D-VICOM RMSE with respect to , and around the values (, and ) adopted as constants in our model.
Iv-B Performance on equivalent subsets of TID2008 and CSIQ databases
To validate previous results, the same analysis was extended to the similar subsets of the TID2008 and CSIQ databases. In particular, the five image subsets of TID2008 (white Gaussian noise, Gaussian blur, JPEG and JPEG2000 coding and JPEG2000 transmission errors) already used in  and four subsets (white Gaussian noise, Gaussian blur, JPEG and JPEG2000) of the CSIQ were chosen. Fitting results are summarized in Tables III and III. TID2008 MOS scores were translated into DMOS scores for uniformity.
For these subsets, the accuracy of the D-VICOM is still the best one. This is also visible from the comparison between the scatterplots of the D-VICOM and of the linearized MAD (the seemingly most performing competitor) reported in Fig. 5.
Iv-C Statistical analysis
Returning back to the Tables III and III, the rightmost column contains the LCC obtained by applying the regression coefficients computed for the the largest and most complete data set in our analysis (the LIVE DBR2) on the target subset to assess the generalization capabilities of IQA metrics. Considering that the LCC is insensitive to affine metric and/or DMOS transformations444The SROCC magnitude of uni-dimensional metrics is invariant for monotonic transformations. The SROCC is a robust version of the LCC  and therefore it is equally insensitive to true outliers, non-linear fitting and not well explained scores., results support the generalization power of good IQA metrics and, at the same time, show that D-VICOM is uniformly the most performing subjective quality predictor in this cross-validation exercise, achieving very close results between the two fitting scenarios.
Moderate non Gaussianity and local additive effects of DMOS errors are expected features , due to the low number of human observers involved in subjective tests and the generally non-robust  protocols for score averaging and outlier rejection. Therefore it is useful to check the leave-one-out cross validation (LOOCV) RMSE, i.e., the square root of the Predicted Sum of Squares (PRESS) , scaled by the subset size, since it measures the expected RMSE for a test image added to the data set. The LOOCV RMSE is analytically computable for D-VICOM, depending on the projection (hat) matrix [29, 48] of the LS system matrix generated by (27), and yields for the LIVE DRB2, for the TID2008 subset and for the CSIQ subset. All these values are extremely close to the corresponding fitting RMSE, confirming the completeness and the DMOS stability of tried subsets, as well as the good conditioning of the LS fittings.
For the other metrics, the empirical LOOCV RMSE computation is unpractical, but, assuming Gaussianity and homoscedasticity of the DMOS errors within each subset, it can be estimated for large samples by the Akaike Information Criterion (AIC) [32, 34], which can be simplified for the generic th IQA metric as
for a subset size and adjustable parameters555The RMSE is a further nuisance parameter for LS regression.. AIC application does not require nested models , is summable across databases for overall evaluation and admits a significance test through the analysis of the (raw) AIC weights  . In particular, (i.e., AIC scores different by more than nats) indicate high confidence () in the model with the smallest AIC .
For the three analyzed subsets, the AIC and scores are listed in Table IV
. The AIC significance test is passed by the D-VICOM in all cases. In the same table, the residual kurtosis and theth percentile () of residual magnitudes provide additional insight about the fitting error properties.
However, the kurtosis and values do not indicate clear differences or worrisome issues across metrics and subsets, with a moderate non Gaussianity of the residuals and an acceptable .
On the other hand, the small typical LCC () between the residual vectors of the best metrics here (the D-VICOM and the MAD) suggests that there is a substantial excess variance not explained by the models. So a comparison of the LS error variances by F-test for independent Gaussian variables  is useful.
In this case, the one-sided confidence level about the significance of the D-VICOM RMSE improvement is attained whenever the RMSE of a competing IQA metric exceeds for the LIVE DBR2 database, for the TID2008 subset and for the CSIQ subset. This confidence level is always attained by the D-VICOM, except against the classical VICOM for the LIVE DBR2 and the MAD for the CSIQ subset.
Iv-D Un-covered impairments
To meet the possible, legitimate curiosity of the reader, in Tables VI and VI the statistical performance indexes are reported for the whole TID2008 and CSIQ archives, that contain impairments not explicitly modeled by the D-VICOM scheme. The average accuracy of D-VICOM is still acceptable, even if theoretically unsupported.
Iv-E An invariance property
Consistency aspects of human perception of quality among different people are relevant for IQA, as discussed in . The analysis performed so far neglected the relative perceptual sensitivity of lost and spurious details on the DMOS. In the sequel, an invariance property is deduced from the following equivalence argument: a subject has to indicate the blur strength which yields the same subjective quality loss of a fixed amount of noise added to the same image. There is not any apparent reason why the response should change among different people with normal vision capability under the same viewing conditions, at least in an average sense.
In a formal setting, it is assumed that two generic databases and , characterized by the D-VICOM coefficient sets (27) and , share the same original images and that a generic blurred image contained in and has DMOS and respectively. Since blur essentially does not add spurious details, .
Moreover, let us imagine that in both databases there exists an equivalent image contaminated by additive noise and characterized by the same DMOS values and . Since white noise essentially does not introduce detail loss, .
Inserting these values into (27) yields
Simplifying, it turns out that the ratio
should not change across databases. On the basis of this equivalence argument, the D-VICOM is rewritten as
where is kept as a constant. This argument can be extended to the actual cases where images are different among databases, stating that the D-VICOM estimator (33) can be tuned to any empirical database by adjusting only the pair , i.e., the estimator offset and slope. This result confirms at a glance the excellent D-VICOM LCC cross-validation results of Sect. IV-B.
Statistical identification of started from a coarse guess obtained by the regression coefficients (27) of each quartet of distortions common to LIVE DBR2, TID2008 and CSIQ databases. Then an iterative LS optimization of the pairs of each quartet in (33) and of a unique on the joint database (seven parameters in total) was performed by minimizing the sum of the squared DMOS residuals of the three subsets666CSIQ DMOS values were up-scaled by to balance residual energies.. The cost function achieved a broad minimum at , so that the D-VICOM estimator (33) assumes the operative form
characterized by only two parameters, offset () and slope (), to be tuned to the desired DMOS scale. Table VII shows that the statistical performance of the two-parameter D-VICOM remains substantially unchanged, as expected.
To explore the impairment coverage of (34), Table VIII lists the statistical indexes of the D-VICOM, the MAD, and the GMSD (i.e., the most robust metric from Table VI) on the image subsets of the TID2008 , trained on the five impairment subset previously used, deemed safe for ordinary LS fitting.
The RMSE strongly varies among subsets and metrics, complicating the suitability check of IQA metrics on specific impairment sets. However, AIC minimization, under the proper statistical assumptions, remains useful for a quick significance test . Assuming in this scenario zero mean Gaussian fitting errors with a different for the th metric and the th subset, containing images, and fitting parameters, an upper bound to the for the wider set is:
The AIC verification effort is still combinatorial, but from Table VIII the D-VICOM (34) emerges as the best overall choice as regards the fitting RMSE (and therefore the AIC), except for low frequency impairments (quantization noise, block distortions, contrast and intensity changes), barely observable by the gradient operator (1), but exhibits a surprising robustness to impulse distortion.
|LIVE DBR2 (full)|
|TID2008 (5 dist. subset)|
|CSIQ (4 dist. subset)|
|No.||Subset name ||RMSE||SROCC||LCC||RMSE||SROCC||LCC||RMSE||SROCC||LCC|
|1||Additive Gaussian noise|
|2||Additive noise in color…|
|3||Spatially correlated noise|
|5||High frequency noise|
|12||JPEG transmission errors|
|13||JPEG2000 transmission errors|
|14||Non eccentricity pattern noise|
|15||Local block-wise distortions…|
|16||Mean shift (intensity shift)|
|Training set (subsets )|
Iv-F The SUPERQUARTET
The empirical DMOS values of the TID2008 and of the CISQ databases were mapped onto the LIVE DBR2 DMOS scale using the slope/offset pairs estimated in the previous step, forming a unique large database called the SUPERQUARTET777A similar realignment of different databases was already performed in  for video sequences, on a space of seven features. The entire SUPERQUARTET realigned DMOS database is freely available upon request to the contact author. It contains impaired images from the LIVE DBR2, from the TID2008 and from the CSIQ, for a total of images., viewed as the LIVE DBR2 populated by compatible samples migrated from the TID2008 and the CSIQ databases, after proper realignment of the DMOS values to cope with protocol differences.
The D-VICOM and a set of popular FR quality estimators, linearized by a logistic function , were applied to the SUPERQUARTET. Since the DMOS scale is unique (i.e., the LIVE DBR2 one), all estimators were tuned using a single set of parameters (five for the linearization of uni-dimensional IQA estimators and one offset and slope pair for the D-VICOM). Table IX lists the RMSE, the SROCC, the LCC and theAIC scores. In Fig. 7 the cognitive chart of the SUPERQUARTET is displayed. The state clusters of all databases remarkably overlap. In Fig. 8 the scatterplots of the SUPERQUARTET DMOS values versus the optimal estimates of D-VICOM and MAD , the seemingly best competitor from Table IX, are shown.
In turn, the excellent cross-fitting of the LIVE DBR2, the TID2008 and the CSIQ data, up to a simple affine transformation, demonstrates the remarkable statistical homogeneity of these independent experiments under the D-VICOM paradigm.
In conclusion, the following quality estimator
is proposed as a conventional calibration free metric, adjusted to the LIVE DBR2 DMOS scale for the covered impairments. This estimator, hereinafter called ID-VICOM, when applied to the entire LIVE DBR2, yielded , and LCC = , nearly identical to the top performance of the three-parameter D-VICOM in Table I, as well as , and on the joint TID+CSIQ realigned quartets.
V Direct DMOS scale setting
However, the LIVE DBR2 DMOS scale adopted for ID-VICOM might not be adequate in some applications. It is possible to set any desired scale by conveniently specifying another offset and slope pair, using a single test image. To this purpose, the offset of the DMOS scale is fixed to the desired value for perfect images888The sample DMOS is usually affected by equivocation between original and slightly distorted images., say , so that
The slope can be determined from a single image affected by additive white noise of known variance . In fact, from the diagram of Fig. 9, where the values of are plotted versus for the LIVE DBR2  noisy images, we see that the value of is reliably linked to the noise variance. Moreover is calculated from the noise free image. Therefore, is obtained from (22) and (23). Assigning now a specific DMOS value to , the value of is read on the cognitive chart of Fig. 7 and is finally determined from (36).
Such a huge generalization power of ID-VICOM (36) starting from the quality of a single image is surprising at glance, but it is a necessary consequence of the following concurrent facts:
the pronounced linearity of the metrics with the target impairments;
the statistical stability of the additive white noise variance estimate through ;
the precision of the locus traced by images affected by white noise in the cognitive chart.
Vi Computational budget
The major computational burden of D-VICOM lies in the gradient extraction and in the solution of (7), which amounts to real-valued -D fast correlations/convolution across the image, plus the solution of a linear system of normal equations  of size three for each image point. With a negligible performance cost, it is also possible to solve (7
) on a point grid decimated by two on each axis and to linearly interpolate the computed coefficient setsin the remaining points, thus reducing the prediction cost by about .
The average computational time of the D-VICOM MATLAB code with , running on PC equipped by an Intel Core i7-3370K GHz CPU, was s ( s for the fast version999The basic and fast versions of the D-VICOM are freely available upon request from the contact author.) on a pixel image. For the sake of comparison, the MAD ran at the same s, the VIF at s, the FSIM at s, the MS-SSIM at s and the GMSD at only ms.
The intricate problem of forcing a unique metric to linearly fit subjective scores caused by heterogeneous impairments was circumvented by decomposing it into two easier problems of determining partial metrics linearly related to two well discernible error causes: detail loss and spurious detail addition.
On the basis of the approximate linearity of metrics and of the assumed perceptual properties, it is shown that it is possible to fit these metrics to subjective DMOS ratings by adjusting just the two parameters of an affine transformation. This makes possible to simply merge empirical results coming from different databases for the class of impairments covered by the D-VICOM model.
Last but not least, it is possible to determine a conventional ID-VICOM index, after fixing the DMOS value for one original image and a noisy version of the same image.
In conclusion, the main merits of the D-VICOM estimator are:
a priori connotation of the impairments covered by the method;
outstanding accuracy over a wide quality range;
simple and accurate merging of empirical DMOS from different databases;
analytic diagnostic capabilities through the maps of the gradient attenuation, of the gradient residuals and of the cognitive states;
choice of fast calibration based on a single sample image.
Let us finally remark that the D-VICOM quality predictor is not prone to the natural image hypothesis, which supports many existing methods. Moreover, the D-VICOM could be extended to other classes of impairments by adding appropriate metrics. In particular, the D-VICOM is suitable for detail displacement measurements, required in video and -D quality assessment.
Appendix: Noise effects compensation in (7)
The LS system (7) which computes the estimate of at each point has the general form:
where is a diagonal matrix holding the analysis window weights , and are matrices built from the real and the imaginary parts of (filtered) original gradients, and are vectors built from the real and the imaginary parts of the test image gradients, herein assumed corrupted by the additive noise vectors and having zero mean, i.i.d. entries of variance .
Under above hypotheses and neglecting the small influence of the regularizing constant , the predicted target can be written as 
is a three-column orthogonal matrix with the same span as the system matrix. Therefore some noise is added byto the true predicted gradient with variance
which unduly increases the estimated in (13).
Under the same assumptions, the expected residual energy of (37) is
The exact calculus of at each would require a costly full QR solution of (7) . Following a worst case approach and setting , as done throughout the paper, the bound holds by the Courant Fischer min-max theorem , where and respectively are the largest and the second largest samples of . Combining this bound with (40), we get
whose estimate is subtracted from the energy of the predicted gradient in (13).
This worst case assumption is justified by the fact that most Gaussian-windowed gradient energy is concentrated in few equations of (37). The efficacy of the proposed correction is demonstrated by the statistical stability of for the white noise corrupted images in Figs. 2 and 7, at the cost of a negligible positive bias.
-  A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene statistics to perceptual quality,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3350–3364, Dec 2011.
-  A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, Dec 2012.
-  M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural scene statistics approach in the DCT domain,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3339–3352, Aug 2012.
-  D. M. Chandler, “Seven challenges in image quality assessment: Past, present, and future research,” ISRN Signal Processing, vol. 2013, pp. 1–53, 2013.
-  S. Winkler, “Issues in vision modeling for perceptual video quality assessment,” Signal Processing, vol. 78, no. 2, pp. 231–252, Feb. 1999.
-  D. M. Chandler and S. Hemami, “VSNR: a wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. on Image Processing, vol. 16, no. 9, pp. 2284 –2298, Sept. 2007.
-  D. M. Chandler, K. H. Lim, and S. S. Hemami, “Effects of spatial correlations and global precedence on the visual fidelity of distorted images,” in Proc. SPIE, vol. 6057, 2006, pp. 60 570F–60 570F–15. [Online]. Available: http://dx.doi.org/10.1117/12.655442
-  P. Cavanagh, “Visual cognition,” Vision Research, vol. 51, no. 13, pp. 1538 – 1551, 2011, vision Research 50th Anniversary Issue: Part 2. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0042698911000381
-  E. Vu and D. Chandler, “Visual fixation patterns when judging image quality: Effects of distortion type, amount, and subject experience,” in Image Analysis and Interpretation, 2008. SSIAI 2008. IEEE Southwest Symposium on, Mar. 2008, pp. 73–76.
-  D. Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. The MIT Press, 2010.
-  E. C. Larson and D. M. Chandler, “Most apparent distortion: full-reference image quality assessment and the role of strategy,” J. Electron. Imaging, vol. 9, no. 1, pp. 011 006/1–21, Jan. 2010.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
-  Z. Wang, E. Simoncelli, and A. Bovik, “Multi-scale structural similarity for image quality assessment,” in Proc. of the 37th IEEE Asilomar Conference on Signal, Systems and Computer, vol. 2. IEEE Comp. Soc., Nov. 2003, pp. 1398–1402.
-  L. Zhang, D. Zhang, M. Xuanqin, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
-  W. Xue, L. Zhang, X. Mou, and A. Bovik, “Gradient magnitude similarity deviation: A highly efficient perceptual image quality index,” IEEE Trans. on Image Processing, vol. 23, no. 2, pp. 684–695, Feb. 2014.
-  H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Trans. on Image Processing, vol. 15, no. 2, pp. 430–444, Feb. 2006.
-  L. Capodiferro, G. Jacovitti, and E. D. Di Claudio, “Two-dimensional approach to full-reference image quality assessment based on positional structural information,” IEEE Trans. on Image Processing, vol. 21, no. 2, pp. 505–516, Feb. 2012.
-  E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and neural representation,” Annual Review of Neuroscience, vol. 24, no. 1, pp. 1193–1216, 2001, pMID: 11520932. [Online]. Available: http://dx.doi.org/10.1146/annurev.neuro.24.1.1193
-  G. Zhai, X. Wu, X. Yang, W. Lin, and W. Zhang, “A psychovisual quality metric in free-energy principle,” IEEE Trans. on Image Processing, vol. 21, no. 1, pp. 41–52, Jan 2012.
-  M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. on Broadcasting, vol. 50, no. 3, pp. 312–322, Sep. 2004.
T. J. Liu, K. H. Liu, J. Y. Lin, W. Lin, and C. C. J. Kuo, “A paraboost method
to image quality assessment,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 1, pp. 107–121, Jan 2017.
-  Z. Wang, A. Bovik, and B. Evan, “Blind measurement of blocking artifacts in images,” in Proc. of 2000 Int. Conf. on Image Processing, vol. 3, Sep. 2000, pp. 981–984.
-  Z. M. Parvez Sazzad, Y. Kawayoke, and Y. Horita, “No reference image quality assessment for JPEG2000 based on spatial features,” Image Commun., vol. 23, no. 4, pp. 257–268, Apr. 2008. [Online]. Available: http://dx.doi.org/10.1016/j.image.2008.03.005
-  J. Zhu and N. Wang, “Image quality assessment by visual gradient similarity,” IEEE Trans. on Image Processing, vol. 21, no. 3, pp. 919–933, Mar. 2012.
-  J. B. Martens and V. Kayargadde, “Image quality prediction in a multidimensional perceptual space,” in Proc. of the 1996 IEEE Int. Conf. on Image Processing, vol. 1, Sep. 1996, pp. 877 – 880.
-  O. I. Ieremeiev, V. V. Lukin, N. N. Ponomarenko, K. O. Egiazarian, and J. Astola, “Combined full-reference image visual quality metrics,” Electronic Imaging, vol. 2016, no. 15, pp. 1–10, 2016.
-  K. Gu, G. Zhai, X. Yang, and W. Zhang, “A new psychovisual paradigm for image quality assessment: from differentiating distortion types to discriminating quality conditions,” Signal, Image and Video Processing, vol. 7, no. 3, pp. 423–436, 2013. [Online]. Available: http://dx.doi.org/10.1007/s11760-013-0445-2
-  P. Huber, Robust Statistics. New York: John Wiley, 1981.
-  H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent Full Reference image quality assessment algorithms,” IEEE Trans. on Image Processing, vol. 15, no. 11, pp. 3441–3452, Nov. 2006.
-  VQEG, “Final report from the video quality experts group on the validation of objective models of video quality assessment, phase II,” Video Quality Experts Group, http://www.vqeg.org, Tech. Rep., Aug. 2003.
-  H. Akaike, “A new look at the statistical model identification,” Automatic Control, IEEE Transactions on, vol. 19, no. 6, pp. 716–723, Dec 1974.
-  K. P. Burnham and D. R. Anderson, “Multimodel inference: Understanding AIC and BIC in model selection,” Sociological Methods and Research, vol. 33, no. 2, pp. 261–304, 2004. [Online]. Available: http://smr.sagepub.com/content/33/2/261.abstract
-  M. Stone, “An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 44–47, 1977. [Online]. Available: http://www.jstor.org/stable/2984877
-  K. Okarma, “Combined full-reference image quality metric linearly correlated with subjective assessment,” in Artificial Intelligence and Soft Computing, ser. Lecture Notes in Computer Science, L. Rutkowski, R. Scherer, R. Tadeusiewicz, L. Zadeh, and J. Zurada, Eds. Springer Berlin Heidelberg, 2010, vol. 6113, pp. 539–546.
-  P. Skurowski, “Towards the new concept of linear image quality assessment. review of phase correlation formulas,” Studia Informatica, vol. 35, no. 4, 2014. [Online]. Available: http://studiainformatica.polsl.pl/index.php/SI/article/view/709
-  J. Lubin and D. Fibush, “Sarnoff JND vision model,” T1A1.5 Working Group Document 97-612, ANSI T1 Standards Committee, 1997.
-  L. Capodiferro, E. D. Di Claudio, G. Jacovitti, and F. Mangiatordi, “Structure oriented image quality assessment based on multiple statistics,” in Proc. of the Fifth Int. Workshop on Video Processing and Quality Metrics for Consumer Electronics, Scottsdale, AZ, USA, Jan. 2010.
-  S. Li, F. Zhang, L. Ma, and K. N. Ngan, “Image quality assessment by separately evaluating detail losses and additive impairments,” IEEE Trans. on Multimedia, vol. 13, no. 5, pp. 935–949, Oct. 2011.
-  S. Haykin, Adaptive Filter Theory (3rd Ed.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996.
-  A. Neri and G. Jacovitti, “Maximum likelihood localization of 2-D patterns in the Gauss-Laguerre transform domain: theoretic framework and preliminary results,” IEEE Trans. on Image Processing, vol. 13, no. 1, pp. 72–86, Jan. 2004.
-  H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik, “LIVE image quality assessment Database Release 2,” Online. Available: http://live.ece.utexas.edu/research/quality.
-  N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “TID2008: a database for evaluation of full-reference visual quality assessment metrics,” in Advances of Modern Radioelectronics, vol. 10, 2009, pp. 30–45.
-  E. C. Larson and D. M. Chandler. Full-reference image quality assessment and the role of strategy: The most apparent distortion. online supplement. [Online]. Available: vision.okstate.edu/mad/
-  J. Mannos and D. Sakrison, “The effects of a visual fidelity criterion of the encoding of images,” IEEE Trans. on Information Theory, vol. 20, no. 4, pp. 525–536, Jul. 1974.
-  E. D. Di Claudio, G. Jacovitti, and A. Laurenti, “Maximum likelihood orientation estimation of 1-D patterns in Laguerre-Gauss subspaces,” IEEE Trans. on Image Processing, vol. 19, no. 5, pp. 1113–1125, May 2010.
-  G. Golub and C. V. Loan, Matrix Computations, 2nd ed. Baltimore, USA: John Hopkins University Press, 1989.
-  T. Tarpey, “A note on the prediction sum of squares statistic for restricted least squares,” The American Statistician, vol. 54, no. 2, pp. 116–118, 2000. [Online]. Available: http://www.jstor.org/stable/2686028
-  E. M. Granger and K. N. Cupery, “An optical merit function (SQF), which correlates with subjective image judgments,” Photographic Science and Engineering, vol. 16, pp. 221–230, 1972.
-  C. R. Carlson and R. W. Cohen, “A simple psycho-physical model for predicting the visibility of displayed information,” Proc. Soc. Inf. Displ., vol. 21, pp. 229–246, 1980.
-  P. G. J. Barten, “Evaluation of subjective image quality with the square-root integral method,” J. Opt. Soc. Am. A, vol. 7, no. 10, pp. 2024–2031, Oct 1990. [Online]. Available: http://josaa.osa.org/abstract.cfm?URI=josaa-7-10-2024
-  S. S. Stevens, “The surprising simplicity of sensory metrics,” American Psychologist, vol. 17, no. 1, pp. 29–39, Jan. 1962.
-  R. Parisi, E. D. D. Claudio, G. Orlandi, and B. Rao, “A generalized learning paradigm exploiting the structure of feedforward neural networks,” IEEE Trans. on Neural Networks, vol. 7, no. 6, pp. 1450–1460, Nov. 1996.
-  L. Zhang, L. Zhang, X. Mou, and D. Zhang. (13, dec) FSIM: A feature similarity index for image quality assessment. [Online]. Available: http://sse.tongji.edu.cn/linzhang/IQA/FSIM/FSIM.htm
-  W. Xue, L. Zhang, X. Mou, and A. C. Bovik. (2014) Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. [Online]. Available: http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm
-  Z. Wang. (2011, jun) IW-SSIM: Information content weighted structural similarity index for image quality assessment. [Online]. Available: https://ece.uwaterloo.ca/~z70wang/research/iwssim/
-  G. W. Snedecor and W. G. Cochran, Statistical Methods, 8th ed. Iowa State University Press, 1989.
-  R. Wajid, A. Mansoor, and M. Pedersen, “A human perception based performance evaluation of image quality metrics,” in Advances in Visual Computing, ser. Lecture Notes in Computer Science, G. Bebis, R. Boyle, B. Parvin, D. Koracin, R. McMahan, J. Jerald, H. Zhang, S. Drucker, C. Kambhamettu, M. E. Choubassi, Z. Deng, and M. Carlson, Eds. Springer International Publishing, 2014, vol. 8887, pp. 303–312.
-  S. Wolf and M. H. Pinson, “Video quality measurement techniques,” NTIA Report 02-392, 2002.