1 Introduction
The last decades have seen continuous progress in image restoration algorithms (e.g. for denoising, deblurring, superresolution) both in visual quality and in distortion measures like peak signaltonoise ratio (PSNR) and structural similarity index (SSIM) [45]. However, in recent years, it seems that the improvement in reconstruction accuracy is not always accompanied by an improvement in visual quality. In fact, and perhaps counterintuitively, algorithms that are superior in terms of perceptual quality, are often inferior in terms of e.g. PSNR and SSIM [22, 16, 6, 38, 51, 49]. This phenomenon is commonly interpreted as a shortcoming of the existing distortion measures [44], which fuels a constant search for alternative “more perceptual” criteria.
In this paper, we offer a complementary explanation for the apparent tradeoff between perceptual quality and distortion measures. Specifically, we prove that there exists a region in the perceptiondistortion plane, which cannot be attained regardless of the algorithmic scheme (see Fig. 1). Furthermore, the boundary of this region is monotone. Therefore, in its proximity, it is only possible to improve either perceptual quality or distortion, one at the expense of the other. The perceptiondistortion tradeoff exists for all distortion measures, and is not only a problem of the meansquare error (MSE) or SSIM criteria. However, for some measures, the tradeoff is weaker than others. For example, we find empirically that the recently proposed distance between deepnet features [16, 22] has a weaker tradeoff with perceptual quality than MSE. This aligns with the observation that this measure is “more perceptual” than MSE.
Let us clarify the difference between distortion and perceptual quality. The goal in image restoration is to estimate an image
from its degraded version (e.g. noisy, blurry, etc.). Distortion refers to the dissimilarity between the reconstructed image and the original image . Perceptual quality, on the other hand, refers only to the visual quality of , regardless of its similarity to . Namely, it is the extent to which looks like a valid natural image. An increasingly popular way of measuring perceptual quality is by using realvs.fake user studies, which examine the ability of human observers to tell whether is real or the output of an algorithm [15, 53, 39, 8, 6, 14, 54, 11] (similarly to the idea underlying generative adversarial nets [10]). Therefore, perceptual quality can be defined as the best possible probability of success in such discrimination experiments, which as we show, is proportional to the distance between the distribution of and that of natural images.Based on these definitions of perception and distortion, we follow the logic of ratedistortion theory [4]. That is, we seek to characterize the behavior of the best attainable perceptual quality (minimal deviation from natural image statistics) as a function of the maximal allowable average distortion, for any estimator. This perceptiondistortion function (wide curve in Fig. 1) separates between the attainable and unattainable regions in the perceptiondistortion plane and thus describes the fundamental tradeoff between perception and distortion. Our analysis shows that algorithms cannot be simultaneously very accurate and produce images that fool observers to believe they are real, no matter what measure is used to quantify accuracy. This tradeoff implies that optimizing distortion measures can be not only ineffective, but also potentially damaging in terms of visual quality. This has been empirically observed e.g. in [22, 16, 38, 51, 6], but was never established theoretically.
From the standpoint of algorithm design, we show that generative adversarial nets (GANs) provide a principled way to approach the perceptiondistortion bound. This gives theoretical support to the growing empirical evidence of the advantages of GANs in image restoration [22, 38, 35, 51, 36, 15, 55].
The perceptiondistortion tradeoff has major implications on lowlevel vision. In certain applications, reconstruction accuracy is of key importance (e.g. medical imaging). In others, perceptual quality may be preferred. The impossibility of simultaneously achieving both goals calls for a new way for evaluating algorithms: By placing them on the perceptiondistortion plane. We use this new methodology to conduct an extensive comparison between recent superresolution (SR) methods, revealing which SR methods lie closest to the perceptiondistortion bound.
2 Distortion and perceptual quality
Distortion and perceptual quality have been studied in many different contexts, and are sometimes referred to by different names. Let us briefly put past works in our context.
2.1 Distortion (fullreference) measures
Given a distorted image and a groundtruth reference image , fullreference distortion measures quantify the quality of by its discrepancy to . These measures are often called full reference image quality criteria because of the reasoning that if is similar to and is of high quality, then is also of high quality. However, as we show in this paper, this logic is not always correct. We thus prefer to call these measures distortion or dissimilarity criteria.
The most common distortion measure is the MSE, which is quite poorly correlated with semantic similarity between images
[44]. Many alternative, more perceptual, distortion measures have been proposed over the years, including SSIM [45], MSSSIM [47], IFC [41], VIF [40], VSNR [3] and FSIM [52]. Recently, measures based on thedistance between deep feature maps of a neuralnet have been shown to capture more semantic similarities. These measures were used as loss functions in superresolution and style transfer applications, leading to reconstructions with high visual quality
[16, 22, 38].2.2 Perceptual quality
The perceptual quality of an image is the degree to which it looks like a natural image, and has nothing to do with its similarity to any reference image. In many image processing domains, perceptual quality has been associated with deviations from natural image statistics.
Human opinion based quality assessment
Perceptual quality is commonly evaluated empirically by the mean opinion score of human subjects [31, 29]. Recently, it has become increasingly popular to perform such studies through real vs. fake questionnaires [15, 53, 39, 8, 6, 14, 54, 11]. These test the ability of a human observer to distinguish whether an image is real or the output of some algorithm. The probability of success of the optimal decision rule in this hypothesis testing task is known to be
(1) 
where is the totalvariation (TV) distance between the distribution of images produced by the algorithm in question, and the distribution of natural images [32]. Note that decreases as the deviation between and decreases, becoming (no better than a coin toss) when .
Noreference quality measures
Perceptual quality can also be measured by an algorithm. In particular, noreference measures quantify the perceptual quality of an image without depending on a reference image. These measures are commonly based on estimating deviations from natural image statistics. For example, [46, 48, 23] proposed a perceptual quality index based on the KullbackLeibler (KL) divergence between the distribution of the wavelet coefficients of and that of natural scenes. This idea was further extended by the popular methods DIIVINE [31], BRISQUE [29], BLIINDSII [37] and NIQE [30], which quantify perceptual quality by various measures of deviation from natural image statistics in the spatial, wavelet and DCT domains.
GANbased image restoration
Most recently, GANbased methods have demonstrated unprecedented perceptual quality in superresolution [22, 38], inpainting [35, 51], compression [36]
and imagetoimage translation
[15, 55]. This was accomplished by utilizing an adversarial loss, which minimizes some distance between the distribution of images produced by the generator and the distribution of images in the training dataset. A large variety of GAN schemes have been proposed, which minimize different distances between distributions. These include the JensonShannon divergence [10], the Wasserstein distance [1], and any divergence [34].3 Problem formulation
In statistical terms, a natural image can be thought of as a realization from the distribution of natural images . In image restoration, we observe a degraded version relating to via some conditional distribution (corresponding to noise, blur, downsampling, etc.). Given , we produce an estimate according to some distribution . This description is quite general in that it does not restrict the estimator to be a deterministic function of . This problem setting is illustrated in Fig. 2.
Given a fullreference dissimilarity criterion , the average distortion of an estimator is given by
(2) 
where the expectation is over the joint distribution
. This definition aligns with the common practice of evaluating average performance over a database of degraded natural images. Note that some distortion measures, e.g. SSIM, are actually similarity measures (higher is better), yet can always be inverted to become dissimilarity measures.As discussed in Sec. 2.2, the perceptual quality of an estimator (as quantified e.g. by real vs. fake human opinion studies) is directly related to the distance between the distribution of its reconstructed images , and the distribution of natural images . We thus define the perceptual quality index (lower is better) of an estimator as
(3) 
where is some divergence between distributions, e.g. the KL divergence, TV distance, Wasserstein distance, etc.
Notice that the best possible perceptual quality is obtained when the outputs of the algorithm follow the distribution of natural images (i.e. ). In this situation, by looking at the reconstructed images, it is impossible to tell that they were generated by an algorithm. However, not every estimator with this property is necessarily accurate. Indeed, we could achieve perfect perceptual quality by randomly drawing natural images that have nothing to do with the original “groundtruth” images. In this case the distortion would be quite large.
Our goal is to characterize the tradeoff between (2) and (3). But let us first exemplify why minimizing the average distortion (2), does not necessarily lead to a low perceptual quality index (3). We illustrate this with the squareerror distortion and the distortion (where is Kronecker’s delta).
3.1 The squareerror distortion
The minimum mean squareerror (MMSE) estimator is given by the posteriormean . Consider the case , where
is a discrete random variable with probability mass function
(4) 
and is independent of (see Fig. 3). In this setting, the MMSE estimate is given by
(5) 
where
(6) 
Notice that can take any value in the range , whereas can only take the discrete values . Thus, clearly, is very different from , as illustrated in Fig. 3. This demonstrates that minimizing the MSE distortion does not generally lead to .
The same intuition holds for images. The MMSE estimate is an average over all possible explanations to the measured data, weighted by their likelihoods. However the average of valid images is not necessarily a valid image, so that the MMSE estimate frequently “falls off” the natural image manifold [22]. This leads to unnatural blurry reconstructions, as illustrated in Fig. 4. In this experiment, is a image comprising smaller digit images. Each digit is chosen uniformly at random from a dataset comprising K images from the MNIST dataset [21] and an additional K blank images. The degraded image is a noisy version of . As can be seen, the MMSE estimator produces blurry reconstructions, which do not follow the statistics of the (binary) images in the dataset.
3.2 The distortion
The discussion above may give the impression that unnatural estimates are mainly a problem of the squareerror distortion, which causes averaging. One way to avoid averaging, is to minimize the binary loss, which restricts the estimator to choose only from the set of values that can take. In fact, the minimum mean distortion is attained by the maximumaposteriori (MAP) rule, which is very popular in image restoration. However, as we exemplify next, the distribution of the MAP estimator also deviates from . This behavior has also be studied in [33].
Consider again the setting of (4). In this case, the MAP estimate is given by
(7) 
where is as in (6). Now, it can be easily verified that when , we have . Namely, the MAP estimator never predicts the value . Therefore, in this case, the distribution of the estimate is
(8) 
This effect can also be seen in the experiment of Fig. 4. Here, the MAP estimator is increasingly dominated by blank images as the noise level rises, and thus clearly deviates from the underlying prior distribution.
4 The perceptiondistortion tradeoff
We saw that low distortion does not generally imply good perceptualquality. An interesting question, then, is: What is the best perceptual quality that can be attained by an estimator with a prescribed distortion level?
Definition 1.
The perceptiondistortion function of a signal restoration task is given by
(9) 
where is a distortion measure and is a divergence between distributions.
In words, is the minimal deviation between the distributions and that can be attained by an estimator with distortion . To gain intuition into the typical behavior of this function, consider the following example.
Example 1.
Suppose that , where and are independent. Take to be the squareerror distortion and to be the KL divergence. For simplicity, let us restrict attention to estimators of the form . In this case, we can derive a closed form solution to Eq. (9) (see Supplementary), which is plotted for several noise levels in Fig. 5. As can be seen, the minimal attainable drops as the maximal allowable distortion (MSE) increases. Furthermore, the tradeoff is convex and becomes more severe at higher noise levels .
In general settings, it is impossible to solve (9) analytically. However, it turns out that the behavior seen in Fig. 5 is typical, as we show next (see proof in the Supplementary).
Theorem 1 (The perceptiondistortion tradeoff).
Note that Theorem 1 requires no assumptions on the distortion measure . This implies that a tradeoff between perceptual quality and distortion exists for any distortion measure, including e.g. MSE, SSIM, square error between VGG features [16, 22], etc. Yet, this does not imply that all distortion measures have the same perceptiondistortion function. Indeed, as we demonstrate in Sec. 6, the tradeoff tends to be less severe for distortion measures that capture semantic similarities between images.
The convexity of implies that the tradeoff is more severe at the lowdistortion and at the highperceptualquality extremes. This is particularly important when considering the TV divergence which is associated with the ability to distinguish between real vs. fake images (see Sec. 2.2). Since is steeper at the lowdistortion regime, any small improvement in distortion for an algorithm whose distortion is already low, must be accompanied by a large degradation in the ability to fool a discriminator. Similarly, any small improvement in the perceptual quality of an algorithm whose perceptual index is already low, must be accompanied by a large increase in distortion. Let us comment that the assumption that is convex, is not very limiting. For instance, any divergence (e.g. KL, TV, Hellinger, ) as well as the Renyi divergence, satisfy this assumption [5, 43]. In any case, the function is monotonically nonincreasing even without this assumption.
4.1 Connection to ratedistortion theory
The perceptiondistortion tradeoff is closely related to the wellestablished ratedistortion theory [4]. This theory characterizes the tradeoff between the bitrate required to communicate a signal, and the distortion incurred in the signal’s reconstruction at the receiver. More formally, the ratedistortion function of a signal is defined by
(10) 
where is the mutual information between and .
There are, however, several key differences between the two tradeoffs. First, in ratedistortion the optimization is over all conditional distributions , i.e. given the original signal. In the perceptiondistortion case, the estimator has access only to the degraded signal , so that the optimization is over the conditional distributions , which is more restrictive. In other words, the perceptiondistortion tradeoff depends on the degradation , and not only on the signal’s distribution (see Example 1). Second, in ratedistortion the rate is quantified by the mutual information , which depends on the joint distribution . In our case, perception is quantified by the similarity between and , which does not depend on their joint distribution. Lastly, mutual information is inherently convex, while the convexity of the perceptiondistortion curve is guaranteed only when is convex.
5 Traversing the tradeoff with a GAN
There exists a systematic way to design estimators that approach the perceptiondistortion curve: Using GANs. Specifically, motivated by [22, 35, 51, 38, 36, 15], restoration problems can be approached by modifying the loss of the generator of a GAN to be
(11) 
where is the distortion between the original and reconstructed images, and is the standard GAN adversarial loss. It is well known that is proportional to some divergence between the generator and data distributions [10, 1, 34] (the type of divergence depends on the loss). Thus, (11) in fact approximates the objective
(12) 
Viewing as a Lagrange multiplier, it is clear that minimizing is equivalent to minimizing (9) for some . Varying correspond to varying , thus producing estimators along the perceptiondistortion function.
Let us use this approach to explore the perceptiondistortion tradeoff for the digit denoising example of Fig. 4 with . We train a Wasserstein GAN (WGAN) based denoiser [1, 12] with an MSE distortion loss . Here, is proportional to the Wasserstein distance between the generator and data distributions. The WGAN has the valuable property that its discriminator (critic) loss is an accurate estimate (up to a constant factor) of [1]. This allows us to easily compute the perceptual quality index of the trained denoiser. We obtain a set of estimators with several values of . For each denoiser, we evaluate the perceptual quality by the final discriminator loss. As seen in Fig. 6, the curve connecting the estimators on the perceptiondistortion plane is monotonically decreasing. Moreover, it is associated with estimates that gradually transition from blurry and accurate to sharp and inaccurate. This curve obviously does not coincide with the analytic bound (9) (illustrated by a dashed line). However, it seems to be adjacent to it. This is indicated by the fact that the leftmost point of the WGAN curve is very close to the leftmost point of the theoretical bound, which corresponds to the MMSE estimator. See the Supplementary for the WGAN training details and architecture.
Besides the MMSE estimator, Figure 6 also includes the MAP estimator and an estimator which randomly draws images from the dataset (denoted “random draw”). The perceptual quality of those three estimators is evaluated, as above, by the final loss of the WGAN discriminator [1], trained (without a generator) to distinguish between the estimators’ outputs and images from the dataset. Note that the denoising WGAN estimator (D) achieves the same distortion as the MAP estimator, but with far better perceptual quality. Furthermore, it achieves nearly the same perceptual quality as the random draw estimator, but with a significantly lower distortion.
6 Practical method for evaluating algorithms
Certain applications may require lowdistortion (e.g. in medical imaging), while others may prefer superior perceptual quality. How should image restoration algorithms be evaluated, then?
Definition 2.
We say that Algorithm A dominates Algorithm B if it has better perceptual quality and less distortion.
Note that if Algorithm A is better than B in only one of the two criteria, then neither dominates nor dominates . Therefore, among a group of algorithms, there may be a large subset which can be considered equally good.
Definition 3.
We say that an algorithm is admissible among a group of algorithms, if it is not dominated by any other algorithm in the group.
As shown in Figure 7, these definitions have very simple interpretations when plotting algorithms on the perceptiondistortion plane. In particular, the admissible algorithms in the group, are those which lie closest to the perceptiondistortion bound.
As discussed in Sec. 2, distortion is measured by fullreference (FR) metrics, e.g. [45, 47, 41, 40, 3, 52, 16]. The choice of the FR metric, depends on the type of similarities we want to measure (perpixel, semantic, etc.). Perceptual quality, on the other hand, is ideally quantified by collecting human opinion scores, which is time consuming and costly [31, 37]. Instead, the divergence can be computed, for instance by training a discriminator net (see Sec. 5). However, this requires many training images and is thus also time consuming. A practical alternative is to utilize noreference (NR) metrics, e.g. [29, 30, 37, 31, 50, 17, 26], which quantify the perceptual quality of an image without a corresponding original image. In scenarios where NR metrics are highly correlated with human meanopinionscores (e.g. superresolution [26]), they can be used as a fast and simple method for approximating the perceptual quality of an algorithm^{2}^{2}2In scenarios where NR metrics are inaccurate (e.g. blind deblurring with large blurs [20, 25]), the perceptual metric should be humanopinionscores or the loss of a discriminator trained to distinguish the algorithms’ outputs from natural images..
We use this approach to evaluate SR algorithms in a magnification task, by plotting them on the perceptiondistortion plane (Fig. 8). We measure perceptual quality using the recent NR metric by Ma et al. [26] which is specifically designed for SR quality assessment (see Supplementary for experiments with the NR metrics BRISQUE [29], NIQE [30] and BLIINDSII [37]). We measure distortion by the five common FR metrics RMSE, SSIM [45], MSSSIM [47], IFC [41] and VIF [40], and additionally by the recent metric (the distance in the feature space of a VGG net) [22, 16]. To conform to previous evaluations, we compute all metrics on the ychannel after discarding a 4pixel border (except for VGG, which is computed on RGB images). Comparisons on color images can be found in the Supplementary. The algorithms are evaluated on the BSD100 dataset [27]. The evaluated algorithms include: A+ [42], SRCNN [9], SelfEx [13], VDSR [18], Johnson et al. [16], LapSRN [19], Bae et al. [2] (“primary” variant), EDSR [24], SRResNet variants which optimize MSE and [22], SRGAN variants which optimize MSE, , and , in addition to an adversarial loss [22], ENet [38] (“PAT” variant), Deng [7] (), and Mechrez et al. [28].
Interestingly, the same pattern is observed in all plots: (i) The lower left corner is blank, revealing an unattainable region in the perceptiondistortion plane. (ii) In proximity of this blank region, NR and FR metrics are anticorrelated, indicating a tradeoff between perception and distortion. Notice that the tradeoff exists even for the IFC and VIF measures, which are considered to capture visual quality better than MSE and SSIM. The tradeoff is evident also for the VGG measure, but is somewhat weaker than for MSE. This may indicate that VGG is a more “perceptual” metric. It should be noted, however, that when using other NR metrics to measure perceptual quality, the tradeoff for VGG does not appear to be weaker (see Supplementary). This is due to the sensitivity of some of the NR metrics to the periodic artifacts that arise when minimizing the VGG distortion^{3}^{3}3Minimizing VGG (as done by SRResNetVGG), leads to sharper images (compared to minimizing MSE) but with periodic artifacts [16]. Different NR metrics have different sensitivities to these artifacts. (see Fig. 9).
Figure 9 depicts the outputs of several algorithms lying closest to the perceptiondistortion bound in the IFC graph. While the images are ordered from low to high distortion (according to IFC), their perceptual quality clearly improves from left to right.
Both FR and NR measures are commonly validated by calculating their correlation with human opinion scores, based on the assumption that both should be correlated with perceptual quality. However, as Fig. 10 shows, while FR measures can be wellcorrelated with perceptual quality when distant from the unattainable region, this is clearly not the case when approaching the perceptiondistortion bound. In particular, all tested FR methods are inconsistent with human opinion scores which found the SRGAN to be superb in terms of perceptual quality [22], while NR methods successfully determine this. We conclude that image restoration algorithms should always be evaluated by a pair of NR and FR metrics, constituting a reliable, reproducible and simple method for comparison, which accounts for both perceptual quality and distortion.
Up until 2016, SR algorithms occupied only the upperleft section of the perceptiondistortion plane. Nowadays, emerging techniques are exploring new regions in this plane. The SRGAN, ENet, Deng, Johnson et al. and Mechrez et al. methods are the first (to our knowledge) to populate the high perceptual quality region. In the near future we will most likely witness continued efforts to approach the perceptiondistortion bound, not only in the lowdistortion region, but throughout the entire plane.
7 Conclusion
We proved and demonstrated the counterintuitive phenomenon that distortion and perceptual quality are at odds with each other. Namely, the lower the distortion of an algorithm, the more its distribution must deviate from the statistics of natural scenes. We showed empirically that this tradeoff exists for many popular distortion measures, including those considered to be wellcorrelated with human perception. Therefore, any distortion measure alone, is unsuitable for assessing image restoration methods. Our novel methodology utilizes a pair of NR and FR metrics to place each algorithm on the perceptiondistortion plane, facilitating a more informative comparison of image restoration methods.
Acknowledgements This research was supported in part by an Alon Fellowship, by the Israel Science Foundation (grant no. 852/17), and by the Ollendorf Foundation.
References

[1]
M. Arjovsky, S. Chintala, and L. Bottou.
Wasserstein generative adversarial networks.
In
International Conference on Machine Learning (ICML)
, pages 214–223, 2017. 
[2]
W. Bae, J. Yoo, and J. Chul Ye.
Beyond deep residual learning for image restoration: Persistent
homologyguided manifold simplification.
In
Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
, pages 145–153, 2017.  [3] D. M. Chandler and S. S. Hemami. VSNR: A waveletbased visual signaltonoise ratio for natural images. IEEE Transactions on Image Processing, 16(9):2284–2298, 2007.
 [4] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.
 [5] I. Csiszár, P. C. Shields, et al. Information theory and statistics: A tutorial. Foundations and Trends® in Communications and Information Theory, 1(4):417–528, 2004.
 [6] R. Dahl, M. Norouzi, and J. Shlens. Pixel recursive super resolution. In International Conference on Computer Vision (ICCV), pages 5439–5448, 2017.
 [7] X. Deng. Enhancing image quality via style transfer for single image superresolution. IEEE Signal Processing Letters, 2018.
 [8] E. L. Denton, S. Chintala, R. Fergus, et al. Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems (NIPS), pages 1486–1494, 2015.
 [9] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image superresolution. In European Conference on Computer Vision (ECCV), pages 184–199, 2014.
 [10] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.

[11]
S. Guadarrama, R. Dahl, D. Bieber, M. Norouzi, J. Shlens, and K. Murphy.
PixColor: Pixel recursive colorization.
British Machine Vision Conference (BMVC), 2017.  [12] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems (NIPS), pages 5769–5779, 2017.
 [13] J.B. Huang, A. Singh, and N. Ahuja. Single image superresolution from transformed selfexemplars. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5197–5206, 2015.
 [14] S. Iizuka, E. SimoSerra, and H. Ishikawa. Let there be color!: Joint endtoend learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (TOG), 35(4):110, 2016.

[15]
P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros.
Imagetoimage translation with conditional adversarial networks.
In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1125–1134, 2017.  [16] J. Johnson, A. Alahi, and L. FeiFei. Perceptual losses for realtime style transfer and superresolution. In European Conference on Computer Vision (ECCV), pages 694–711, 2016.
 [17] L. Kang, P. Ye, Y. Li, and D. Doermann. Convolutional neural networks for noreference image quality assessment. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1733–1740, 2014.
 [18] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image superresolution using very deep convolutional networks. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1646–1654, 2016.
 [19] W.S. Lai, J.B. Huang, N. Ahuja, and M.H. Yang. Deep Laplacian pyramid networks for fast and accurate superresolution. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 624–632, 2017.
 [20] W.S. Lai, J.B. Huang, Z. Hu, N. Ahuja, and M.H. Yang. A comparative study for single image blind deblurring. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1701–1709, 2016.
 [21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [22] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photorealistic single image superresolution using a generative adversarial network. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 4681–4690, 2017.
 [23] Q. Li and Z. Wang. Reducedreference image quality assessment using divisive normalizationbased image representation. IEEE Journal of Selected Topics in Signal Processing, 3(2):202–211, 2009.
 [24] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image superresolution. In Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
 [25] Y. Liu, J. Wang, S. Cho, A. Finkelstein, and S. Rusinkiewicz. A noreference metric for evaluating the quality of motion deblurring. ACM Transactions on Graphics (TOG), 32(6):175–1, 2013.
 [26] C. Ma, C.Y. Yang, X. Yang, and M.H. Yang. Learning a noreference quality metric for singleimage superresolution. Computer Vision and Image Understanding, 158:1–16, 2017.
 [27] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International Conference on Computer Vision (ICCV), volume 2, pages 416–423, 2001.
 [28] R. Mechrez, I. Talmi, F. Shama, and L. ZelnikManor. Learning to maintain natural image statistics. arXiv preprint arXiv:1803.04626, 2018.
 [29] A. Mittal, A. K. Moorthy, and A. C. Bovik. Noreference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708, 2012.
 [30] A. Mittal, R. Soundararajan, and A. C. Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.
 [31] A. K. Moorthy and A. C. Bovik. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE transactions on Image Processing, 20(12):3350–3364, 2011.
 [32] F. Nielsen. Hypothesis testing, information divergence and computational geometry. In Geometric Science of Information, pages 241–248, 2013.
 [33] M. Nikolova. Model distortions in Bayesian MAP reconstruction. Inverse Problems and Imaging, 1(2):399, 2007.
 [34] S. Nowozin, B. Cseke, and R. Tomioka. fgan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NIPS), pages 271–279, 2016.
 [35] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, 2016.
 [36] O. Rippel and L. Bourdev. Realtime adaptive image compression. In International Conference on Machine Learning (ICML), pages 2922–2930, 2017.
 [37] M. A. Saad, A. C. Bovik, and C. Charrier. Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE transactions on Image Processing, 21(8):3339–3352, 2012.
 [38] M. S. M. Sajjadi, B. Scholkopf, and M. Hirsch. EnhanceNet: Single image superresolution through automated texture synthesis. In International Conference on Computer Vision (ICCV), pages 4491–4500, 2017.
 [39] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems (NIPS), pages 2234–2242, 2016.
 [40] H. R. Sheikh and A. C. Bovik. Image information and visual quality. IEEE Transactions on Image Processing, 15(2):430–444, 2006.
 [41] H. R. Sheikh, A. C. Bovik, and G. De Veciana. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Transactions on Image Processing, 14(12):2117–2128, 2005.
 [42] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored neighborhood regression for fast superresolution. In Asian Conference on Computer Vision, pages 111–126, 2014.

[43]
T. Van Erven and P. Harremos.
Rényi divergence and KullbackLeibler divergence.
IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.  [44] Z. Wang and A. C. Bovik. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine, 26(1):98–117, 2009.
 [45] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
 [46] Z. Wang and E. P. Simoncelli. Reducedreference image quality assessment using a waveletdomain natural image statistic model. In Human Vision and Electronic Imaging, volume 5666, pages 149–159, 2005.
 [47] Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multiscale structural similarity for image quality assessment. In Conference on Signals, Systems and Computers, volume 2, pages 1398–1402, 2003.
 [48] Z. Wang, G. Wu, H. R. Sheikh, E. P. Simoncelli, E.H. Yang, and A. C. Bovik. Qualityaware images. IEEE Transactions on Image Processing, 15(6):1680–1689, 2006.
 [49] C.Y. Yang, C. Ma, and M.H. Yang. Singleimage superresolution: A benchmark. In European Conference on Computer Vision (ECCV), pages 372–386, 2014.
 [50] P. Ye, J. Kumar, L. Kang, and D. Doermann. Unsupervised feature learning framework for noreference image quality assessment. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1098–1105, 2012.

[51]
R. A. Yeh, C. Chen, T. Y. Lim, A. G. Schwing, M. HasegawaJohnson, and M. N.
Do.
Semantic image inpainting with deep generative models.
In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5485–5493, 2017.  [52] L. Zhang, L. Zhang, X. Mou, and D. Zhang. FSIM: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8):2378–2386, 2011.
 [53] R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In European Conference on Computer Vision (ECCV), pages 649–666, 2016.
 [54] R. Zhang, J.Y. Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, and A. A. Efros. Realtime userguided image colorization with learned deep priors. ACM Transactions on Graphics (TOG), 9(4), 2017.
 [55] J.Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. In International Conference on Computer Vision (ICCV), pages 2223–2232, 2017.