A Theory of the Distortion-Perception Tradeoff in Wasserstein Space

07/06/2021 ∙ by Dror Freirich, et al. ∙ Technion 0

The lower the distortion of an estimator, the more the distribution of its outputs generally deviates from the distribution of the signals it attempts to estimate. This phenomenon, known as the perception-distortion tradeoff, has captured significant attention in image restoration, where it implies that fidelity to ground truth images comes at the expense of perceptual quality (deviation from statistics of natural images). However, despite the increasing popularity of performing comparisons on the perception-distortion plane, there remains an important open question: what is the minimal distortion that can be achieved under a given perception constraint? In this paper, we derive a closed form expression for this distortion-perception (DP) function for the mean squared-error (MSE) distortion and the Wasserstein-2 perception index. We prove that the DP function is always quadratic, regardless of the underlying distribution. This stems from the fact that estimators on the DP curve form a geodesic in Wasserstein space. In the Gaussian setting, we further provide a closed form expression for such estimators. For general distributions, we show how these estimators can be constructed from the estimators at the two extremes of the tradeoff: The global MSE minimizer, and a minimizer of the MSE under a perfect perceptual quality constraint. The latter can be obtained as a stochastic transformation of the former.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

page 20

page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image restoration covers some fundamental settings in image processing such as denoising, deblurring and super-resolution. Over the past few years, image restoration methods have demonstrated impressive improvements in both visual quality and distortion measures such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM)

(31). It was noticed, however, that improvement in accuracy, as measured by distortion, does not necessarily lead to improvement in visual quality, referred to as perceptual quality. Furthermore, the lower the distortion of an estimator, the more the distribution of its outputs generally deviates from the distribution of the signals it attempts to estimate. This phenomenon, known as the perception-distortion tradeoff (4), has captured significant attention, where it implies that faithfulness to ground truth images comes at the expense of perceptual quality, namely the deviation from statistics of natural images. Several works have extended the perception-distortion tradeoff to settings such as lossy compression (5) and classification (14).

Despite the increasing popularity of performing comparisons on the perception-distortion plane, the exact characterization of the minimal distortion that can be achieved under a given perception constraint remains an important open question. Although Blau and Michaeli (4) investigated the basic properties of this distortion-perception function, such as monotonicity and convexity, little is known about its precise nature. While a general answer to this question is unavailable, in this paper, we derive a closed form expression for the distortion-perception (DP) function for the mean squared-error (MSE) distortion and the Wasserstein- perception index.

Our main contributions are: (i) We prove that the DP function is always quadratic in the perception constraint , regardless of the underlying distribution (Theorem 1). (ii) We show that it is possible to construct estimators on the DP curve from the estimators at the two extremes of the tradeoff (Theorem 3): The one that globally minimizes the MSE, and a minimizer of the MSE under a perfect perceptual quality constraint. The latter can be obtained as a stochastic transformation of the former. (iii) In the Gaussian setting, we further provide a closed form expression for optimal estimators and for the corresponding DP curve (Theorems 4 and 5). We show this Gaussian DP curve is a lower bound on the DP curve of any distribution having the same second order statistics. Finally, we illustrate our results, numerically and visually, in a super-resolution setting in Section 5. The proofs of all the theorems in the main text are provided in Appendix B.

Our theoretical results shed light on several topics that are subject to much practical activity. Particularly, in the domain of image restoration, numerous works target perceptual quality rather than distortion (e.g. (30; 13; 12)). However, it has recently been recognized that generating a single reconstructed image often does not convey to the user the inherent ambiguity in the problem. Therefore, many recent works target diverse perceptual image reconstruction, by employing randomization among possible restorations (15; 3; 20; 1). Commonly, such works perform sampling from the posterior distribution of natural images given the degraded input image. This is done e.g. using priors over image patches (7), conditional generative models (18; 21), or implicit priors induced by deep denoiser networks (10). Theoretically, posterior sampling leads to perfect perceptual quality (the restored outputs are distributed like the prior). However, a fundamental question is whether this is optimal in terms of distortion. As we show in Section 3.1, posterior sampling is often not an optimal strategy, in the sense that there often exist perfect perceptual quality estimators that achieve lower distortion.

Another topic of practical interest, is the ability to traverse the distortion-perception tradeoff

at test time, without having to train a different model for each working point. Recently, interpolation has been suggested for controlling several objectives at test-time.

Shoshan et al. (25) propose using interpolation in some latent space in order to approximate intermediate objectives. Wang et al. (30) use per-pixel interpolation for balancing perceptual quality and fidelity. Studies of network parameter interpolation are presented by Wang et al. (30, 29). Deng (6) produces a low distortion reconstruction and a high perceptual quality one, and then uses style transfer to combine them. An important question, therefore, is which strategy is optimal. In Section 3.2 we show that for the MSE–Wasserstein-2 tradeoff, linear interpolation leads to optimal estimators. We also discuss a geometric connection between interpolation and the fact that estimators on the DP curve form a geodesic in Wasserstein space.

2 Problem setting and preliminaries

2.1 The distortion-perception tradeoff

Let

be random vectors taking values in

and , respectively. We consider the problem of constructing an estimator of based on . Namely, we are interested in determining a conditional distribution such that constitutes a good estimate of .

In many practical cases, the goodness of an estimator is associated with two factors: (i) the degree to which is close to on average (low distortion), and (ii) the degree to which the distribution of is close to that of (good perceptual quality). An important question, then, is what is the minimal distortion that can be achieved under a given level of perceptual quality? and how can we construct estimators that achieve this lower bound? In mathematical language, we are interested in analyzing the distortion-perception (DP) function (defined similarly to the perception-distortion function of (4))

(1)

Here, is some distortion criterion,

is some divergence between probability measures, and

is the probability measure on induced by and . We assume that is independent of given .

As discussed in (4), the function is monotonically non-increasing and is convex whenever is convex in its second argument (which is the case for most popular divergences). However, without further concrete assumptions on the distortion measure and the perception index , little can be said about the precise nature of .

Here, we focus our attention on the squared-error distortion and the Wasserstein-2 distance , with which (1) reads

(2)

Throughout this paper we assume that all distributions have finite first and second moments. In addition, from Theorem

3 below it will follow that the minimum is indeed attained, so that (2) is well defined. It is well known that the estimator minimizing the mean squared error (MSE) without any constraints, is given by . This implies that monotonically decreases until reaches , beyond which point takes the constant value . This is illustrated in Fig. 1. It is also known that in this case since the posterior sampling estimator achieves and (4). However, apart for these rather general properties, the precise shape of the DP curve has not been determined to date, and neither have the estimators that achieve the optimum in (2). This is our goal in this paper.

Figure 1: Left: The distortion-perception function. When using the MSE distortion and the Wasserstein-2 perception index, the minimal possible distortion, , is achieved by the estimator . The perception index attained by this estimator is . At the other extreme of the tradeoff, we know that the distortion at is bounded from above by . Right: The minimal distortion for a given perception index can be achieved by an estimator with a distribution lying on a straight line (or geodesic) defined by the geometry of the probabilities space. Given , achieves and , hence

2.2 The Wasserstein and Gelbrich Distances

Before we present our main results, we briefly survey a few properties of the Wasserstein distance, mostly taken from (19). The Wasserstein- () distance between measures and on a separable Banach space with norm is defined by

(3)

where is the set of all probabilities on with marginals and . A joint probability achieving the optimum in (3) is often referred to as optimal plan. The Wasserstein space of probability measures is defined as

and constitutes a metric on .

For any (where is the set of symmetric positive semidefinite matrices in , the Gelbrich distance is defined as

(4)

The root of a PSD matrix is always taken to be PSD. For any two probability measures on with means and covariances , from (8, Thm. 2.1) we have that

(5)

When and

are Gaussian distributions on

, we have that . This equality is obvious for non-singular measures but is true for any two Gaussian distributions (19, p. 18). If and are non-singular, then the distribution attaining the optimum in (3) corresponds to

(6)

where

(7)

is the optimal transformation pushing forward from to (11). This transformation satisfies For a discussion on singular distributions, please see App. A.

3 Main results

3.1 The MSE–Wasserstein-2 tradeoff

The DP function (2) depends, of course, on the underlying joint probability of the signal and measurements . Our first key result is that this dependence can be expressed solely in terms of and . In other words, knowing the distortion and perception index attained by the minimum MSE estimator , suffices for determining for any .

Theorem 1 (The DP function).

The DP function (2) is given by

(8)

where . Furthermore, an estimator achieving perception index and distortion can always be constructed by applying a (possibly stochastic) transformation to .

Theorem 1 is of practical importance because in many cases constructing an estimator that achieves a low MSE (i.e. an approximation of

) is a rather simple task. This is the case, for example, in image restoration with deep neural networks. There, it is common practice to train a network by minimizing its average squared error on a training set. Now, measuring the MSE of such a network on a large test set allows approximating

. We can also obtain an approximation of at least a lower bound on by estimating the second order statistics of and . Specifically, recall that is lower bounded by the Gelbrich distance between and , which is given by (see (5)). Given approximations for and , we can approximate a lower bound on the DP function for any ,

(9)

The bound is attained when and are jointly Gaussian.

Uniqueness

A remark is in place regarding the uniqueness of an estimator achieving (8). As we discuss below, what defines an optimal estimator

is its joint distribution with

. This joint distribution may not be unique, in which case the optimal estimator is not unique. Moreover, even if is unique, the uniqueness of the estimator is not guaranteed because there may be different conditional distributions that lead to the same optimal . In other words, given the optimal , one can choose any joint probability that has marginals and . One option is to take the estimator to be a (possibly stochastic) transformation of , namely . But there may be other options. In cases where either or are a deterministic transformation of (e.g. when has a density, or is an invertible function of ), there is a unique joint distribution with the given marginals (2, Lemma 5.3.2). In this case, if is unique then so is the estimator .

Randomness

Under the settings of image restoration, many methods encourage diversity in their output by adding randomness (15; 3; 20). In our setting, we may ask under what conditions there exists an optimal estimator which is a deterministic function of . For example, when but has some non-atomic distribution, it is clear that no deterministic function of can attain perfect perceptual quality. It turns out that a sufficient condition for the optimal to be a deterministic function of is that have a density. We discuss this in App. B and explicitly illustrate it in the Gaussian case (see Sec. 3.3), where if has a non-singular covariance matrix then is a deterministic function of .

When is posterior sampling optimal?

Many recent image restoration methods attempt to produce diverse high perceptual quality reconstructions by sampling from the posterior distribution (7; 18; 10). As discussed in (4), the posterior sampling estimator attains a perception index of (namely ) and distortion . But an interesting question is: when is this strategy optimal? In other words, in what cases do we have that the DP function at equals precisely and is not strictly smaller? Note from the definition of the Wasserstein distance (3), that . Using this in (8) shows that the DP function at is upper bounded by

(10)

and the upper bound is attained when . To see when this happens, observe that

(11)

We can see that when , the leftmost and rightmost sides become equal, and thus . To understand the meaning of this condition, let us focus on the case where and are jointly diagonalizable. This is a reasonable assumption for natural images, where shift-invariance induces diagonalization by the Fourier basis (28)

. In this case, the condition can be written in terms of the eigenvalues of the matrices, namely

. This condition is satisfied when each equals either or . Namely, the th eigenvalue of the error covariance of , which is given by , is either or . We conclude that posterior sampling is optimal when there exists a subspace

spanned by some of the eigenvectors of

, such that the projection of onto can be recovered from with zero error, but the projection of onto cannot be recovered at all (the optimal estimator is trivial). This is likely not the case in most practical scenarios. Therefore, it seems that posterior sampling is often not optimal. That is, posterior sampling can be improved upon in terms of MSE without any sacrifice in perceptual quality.

3.2 Optimal estimators

While Theorem 1 reveals the shape of the DP function, it does not provide a recipe for constructing optimal estimators on the DP tradeoff. We now discuss the nature of such estimators.

Our first observation is that since is independent of given , its MSE can be decomposed as (see App. B). Therefore, the DP function (2) can be equivalently written as

(12)

Note that the objective in (12) depends on the MSE between and , so that we can perform the minimization on rather than on (once we determine the optimal we can construct a consistent as discussed above).

Now, let us start by examining the leftmost side of the curve , which corresponds to a perfect perceptual quality estimator (i.e. ). In this case, the constraint becomes . Therefore,

(13)

where is the set of all probabilities on with marginals . One may readily recognize this as the optimization problem underlying the Wasserstein-2 distance between and . This leads us to the following conclusion.

Theorem 2 (Optimal estimator for ).

Let be an estimator achieving perception index and MSE . Then its joint distribution with attains the optimum in the definition of . Namely, is an optimal plan between and .

Having understood the estimator at the leftmost end of the tradeoff, we now turn to study optimal estimators for arbitrary . Interestingly, we can show that Problem (12) is equivalent to (see App. B)

(14)

Namely, an optimal is closest to among all distributions within a ball of radius  around , as illustrated in Fig. 1. Moreover, is an optimal plan between and . As it turns out, this somewhat abstract viewpoint leads to a rather practical construction for from the estimators and at the two extremes of the tradeoff. Specifically, we have the following result, proved in App. B.

Theorem 3 (Optimal estimators for arbitrary ).

Let be an estimator achieving perception index  and MSE . Then for any , the estimator

(15)

is optimal for perception index . Namely, it achieves perception index and distortion .

Theorem 3 has important implications for perceptual signal restoration. For example, in the task of image super-resolution, there exist many deep network based methods that achieve a low MSE (13; 27; 24). These provide an approximation for . Moreover, there is an abundance of methods that achieve good perceptual quality at the price of a reasonable degradation in MSE (often by incorporating a GAN-based loss) (12; 30; 23). These constitute approximations for . However, achieving results that strike other prescribed balances between MSE and perceptual quality commonly require training a different model for each setting. Shoshan et al. (25) and Navarrete Michelini et al. (17) tried to address this difficulty by introducing new training techniques that allow traversing the distortion-perception tradeoff at test time. But, interestingly, Theorem 3 shows that in our setting such specialized training methods are not required. Having a model that leads to low MSE and one that leads to good perceptual quality, it is possible to construct any other estimator on the DP tradeoff, by simply averaging the outputs of these two models with appropriate weights. We illustrate this in Sec. 5.

3.3 The Gaussian setting

When and are jointly Gaussian, it is well known that the minimum MSE estimator is a linear function of the measurements . However, it is not a-priori clear whether all estimators along the DP tradeoff are linear in this case, and what kind of randomness they possess. As we now show, equipped with Theorem 3, we can obtain closed form expressions for optimal estimators for any . For simplicity, we assume here that and have zero means and that .

It is instructive to start by considering the simple case, where is non-singular (in Theorem 4 below we address the more general case of a possibly singular ). It is well known that

(16)

Now, since we assumed that , we have from Theorem 2 and (6),(7) that

(17)

Finally, we know that , which is given by the left-hand side of (11). Substituting these expressions into (15), we obtain that an optimal estimator for perception is given by

(18)

As can be seen, this optimal estimator is a deterministic linear transformation of

for any .

The setting just described does not cover the case where is of lower dimensionality than because in that case is necessarily singular (it is a matrix of rank at most ; see (16)). In this case, any deterministic linear function of would result in an estimator with a rank- covariance. Obviously, the distribution of such an estimator cannot be arbitrarily close to that of , whose covariance has rank . What is the optimal estimator in this more general setting, then?

Theorem 4 (Optimal estimators in the Gaussian case).

Assume and are zero-mean jointly Gaussian random vectors with . Denote . Then for any , an estimator with perception index and MSE can be constructed as

(19)

where is a zero-mean Gaussian noise with covariance , which is independent of , and is the pseudo-inverse of .

Note that in this case, we indeed have a random noise component that shapes the covariance of to become closer to as gets closer to . It can be shown (see App. B) that when is invertible, and (19) reduces to (18). Also note that, as in (18), the dependence of on in (19) is only through .

As mentioned in Sec. 3.1, the optimal estimator is generally not unique. Interestingly, in the Gaussian setting we can explicitly characterize a set of optimal estimators.

Theorem 5 (A set of optimal estimators in the Gaussian case).

Consider the setting of Theorem 4. Let satisfy

(20)

and be a zero-mean Gaussian noise with covariance

(21)

that is independent of . Then, for any , an optimal estimator with perception index can be obtained by

(22)

The estimator given in (19) is one solution to (20)-(21), but is generally not unique.

4 A geometric perspective on the distortion-perception tradeoff

In this section we provide a geometric point of view on our main results. Specifically, we show that the results of Theorems 1 and 3 are a consequence of a more general geometric property of the space . In the Gaussian case, this is simplified to a geometry of covariance matrices.

Recall from (14) that the optimal is the one closest to (in terms of Wasserstein distance) among all measures at a distance from . This implies that to determine , we should traverse the geodesic between and until reaching a distance of from . Furthermore, should be the optimal plan between and . Interestingly, geodesics in Wasserstein spaces take a particularly simple form, and their explicit construction also turns out to satisfy the latter requirement. Specifically, let be measures in , let be an optimal plan attaining , and let denote the projection such that . Then, the curve

(23)

is a constant-speed geodesic from to in (2), where is the push-forward operation111For measures on , we say that a measurable transform pushes forward to (denoted ) iff for any measurable . Particularly,

(24)

and it follows that and . Furthermore, if is a constant-speed geodesic with , then the optimal plans between and between are given by

(25)

respectively, where is some optimal plan. Applying (23) to with , we obtain (15), where we show that the obtained estimator achieves . This explains the result of Theorem 3.

It is worth mentioning that this geometric interpretation is simplified under some common settings. For example, when is absolutely continuous (w.r.t. the Lebesgue measure), we have a measurable map which is the solution to the optimal transport problem with the quadratic cost (19, Thm 1.6.2, p.16). The geodesic (23) then takes the form

(26)

Therefore, in our setting, if has a density, then we can obtain by the deterministic transformation (see Remark about randomness in Sec. 3.1).

Further simplification arises when are centered non-singular Gaussian measures, in which case is the linear and symmetric transformation (7). Then, is a Gaussian measure with covariance , where Therefore, in the Gaussian case, the shortest path (23) between distributions is reduced to a trajectory in the geometry of covariance matrices induced by the Gelbrich distance (26). If additionally and commute, then the Gelbrich distance is further reduced to the -distance between matrices, as we discuss in App. C.

5 Numerical illustration

In this Section we evaluate super resolution algorithms on the BSD100 dataset222All codes are freely available and provided by the authors. The BSD100 dataset is free to download for non-commercial research. (16). The evaluated algorithms include EDSR (13), ESRGAN (30), SinGAN (23), ZSSR (24), DIP (27), SRResNet variants which optimize MSE and VGG, SRGAN variants which optimize MSE, VGG and VGG in addition to an adversarial loss (12), ENet (22) (“PAT” and “E” variants). Low resolution images were obtained by downsampling using a bicubic kernel.

In Figure 2 we plot each method on the distortion-perception plane. Specifically, we consider natural (and reconstructed) images to be stationary random sources, and use patches (totally patches) from the RGB images to empirically estimate the mean and covariance matrix for the ground-truth images, and for the reconstructions produced by each method. We then use the estimated Gelbrich distances (4) between the patch distribution of each method and that of ground-truth images, as a perceptual quality index. Recall this is a lower bound on the Wasserstein distance.

We consider EDSR (13) to be the best MSE estimator since it achieves the lowest distortion among the evaluated methods. We therefore estimate the lower bound (9) as

where is the MSE of EDSR, and is the estimated Gelbrich distance between EDSR reconstructions and ground-truth images. Note the unoccupied region under the estimated curve in Figure 2, which is indeed unattainable according to the theory.

We also present 11 estimators which we construct by interpolation between EDSR and ESRGAN (30), . We observe (Figure 2) that estimators constructed using these two extreme points are closer to the optimal DP tradeoff than the evaluated methods. Also note that since ESRGAN does not attain -perception index, we are practically able to use negative values to extrapolate better perception-quality estimators and . In Figure 3 we present a visual comparison between SRGAN-VGG (12) and our interpolated estimator . Both achieve roughly the same RMSE distortion ( for SRGAN, for ), but our estimator achieves a lower perception index. Namely, by using interpolation, we manage to achieve improvement in perceptual quality, without degradation in distortion. The improvement in visual quality is also apparent in the figure. Additional visual comparisons can be found in the Appendix.

Figure 2: Evaluation of SR algorithms. We plot 12 algorithms (Blue) on the Distortion-Perception plane. Here we estimate perception using the Gelbrich distance between empirical means and covariances of the original data and reconstructed data. (Orange) is the estimated lower bound (9) where we consider EDSR to be the global minimizer . Note the unoccupied region under the estimated curve, which is unattainable. We also plot 11 estimators (Green) created by an interpolation between EDSR and ESRGAN, using different relative weights . Note that estimators constructed using these two extreme estimators are closer to the optimal DP curve than the compared methods.
Figure 3: A visual comparison between estimators with approximately the same MSE. Left: SRGAN-VGG. Right: , an interpolation between EDSR and ESRGAN using . Observe the improvement in perceptual quality, without any significant degradation in distortion.

6 Conclusion

In this paper we provide a full characterization of the distortion-perception tradeoff for the MSE distortion and the Wasserstein- perception index. We show that optimal estimators are obtained by interpolation between the minimum MSE estimator and an optimal perfect perception quality estimator. In the Gaussian case, we explicitly formulate these estimators. To the best of our knowledge, this is the first work to derive such closed-form expressions. Our work paves the way towards fully understanding the DP tradeoff under more general distortions and perceptual criteria, and bridging between fidelity and visual quality at test-time, without training different models.

References

  • [1] M. A. Abid, I. Hedhli, and C. Gagné (2021) A generative model for hallucinating diverse versions of super resolution images. arXiv preprint arXiv:2102.06624. Cited by: §1.
  • [2] L. Ambrosio, N. Gigli, and G. Savaré (2008) Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media. Cited by: §A.2, §B.2, §3.1, §4.
  • [3] Y. Bahat and T. Michaeli (2020) Explorable super resolution. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    pp. 2716–2725. Cited by: §1, §3.1.
  • [4] Y. Blau and T. Michaeli (2018) The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6228–6237. Cited by: §A.1, §1, §1, §2.1, §2.1, §2.1, §3.1.
  • [5] Y. Blau and T. Michaeli (2019) Rethinking lossy compression: the rate-distortion-perception tradeoff. In

    International Conference on Machine Learning

    ,
    pp. 675–685. Cited by: §1.
  • [6] X. Deng (2018) Enhancing image quality via style transfer for single image super-resolution. IEEE Signal Processing Letters 25 (4), pp. 571–575. Cited by: §1.
  • [7] R. Friedman and Y. Weiss (2021) Posterior sampling for image restoration using explicit patch priors. arXiv preprint arXiv:2104.09895. Cited by: §1, §3.1.
  • [8] M. Gelbrich (1990) On a formula for the l2 wasserstein metric between measures on euclidean and hilbert spaces. Mathematische Nachrichten 147 (1), pp. 185–203. Cited by: §2.2.
  • [9] R. M. Gray (2006) Toeplitz and circulant matrices: a review. Cited by: Appendix C.
  • [10] B. Kawar, G. Vaksman, and M. Elad (2021) Stochastic image denoising by sampling from the posterior distribution. arXiv preprint arXiv:2101.09552. Cited by: §1, §3.1.
  • [11] M. Knott and C. S. Smith (1984) On the optimal mapping of distributions. Journal of Optimization Theory and Applications 43 (1), pp. 39–49. Cited by: §A.3, §2.2.
  • [12] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017)

    Photo-realistic single image super-resolution using a generative adversarial network

    .
    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690. Cited by: §1, §3.2, §5, §5.
  • [13] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144. Cited by: §1, §3.2, §5, §5.
  • [14] D. Liu, H. Zhang, and Z. Xiong (2019) On the classification-distortion-perception tradeoff. arXiv preprint arXiv:1904.08816. Cited by: §1.
  • [15] A. Lugmayr, M. Danelljan, L. Van Gool, and R. Timofte (2020) Srflow: learning the super-resolution space with normalizing flow. In European Conference on Computer Vision, pp. 715–732. Cited by: §1, §3.1.
  • [16] D. Martin, C. Fowlkes, D. Tal, and J. Malik (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, pp. 416–423. Cited by: §5.
  • [17] P. Navarrete Michelini, D. Zhu, and H. Liu (2018) Multi–scale recursive and perception–distortion controllable image super–resolution. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0. Cited by: §3.2.
  • [18] G. Ohayon, T. Adrai, G. Vaksman, M. Elad, and P. Milanfar (2021) High perceptual quality image denoising with a posterior sampling cgan. arXiv preprint arXiv:2103.04192. Cited by: §1, §3.1.
  • [19] V. M. Panaretos and Y. Zemel (2020) An invitation to statistics in wasserstein space. Springer Nature. Cited by: §A.2, §B.2, §2.2, §2.2, §4.
  • [20] M. Prakash, M. Delbracio, P. Milanfar, and F. Jug (2021) Removing pixel noises and spatial artifacts with generative diversity denoising methods. arXiv preprint arXiv:2104.01374. Cited by: §1, §3.1.
  • [21] M. Prakash, A. Krull, and F. Jug (2020)

    Divnoising: diversity denoising with fully convolutional variational autoencoders

    .
    arXiv preprint arXiv:2006.06072. Cited by: §1.
  • [22] M. S. Sajjadi, B. Scholkopf, and M. Hirsch (2017) Enhancenet: single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4491–4500. Cited by: §5.
  • [23] T. R. Shaham, T. Dekel, and T. Michaeli (2019) Singan: learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4570–4580. Cited by: §3.2, §5.
  • [24] A. Shocher, N. Cohen, and M. Irani (2018) “Zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3118–3126. Cited by: §3.2, §5.
  • [25] A. Shoshan, R. Mechrez, and L. Zelnik-Manor (2019) Dynamic-net: tuning the objective without re-training for synthesis tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3215–3223. Cited by: §1, §3.2.
  • [26] A. Takatsu (2010) On wasserstein geometry of gaussian measures. In Probabilistic approach to geometry, pp. 463–472. Cited by: §4.
  • [27] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2018) Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454. Cited by: §3.2, §5.
  • [28] M. Unser (1984) On the approximation of the discrete karhunen-loeve transform for stationary processes. Signal Processing 7 (3), pp. 231–249. Cited by: Appendix C, §3.1.
  • [29] X. Wang, K. Yu, C. Dong, X. Tang, and C. C. Loy (2019) Deep network interpolation for continuous imagery effect transition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1692–1701. Cited by: §1.
  • [30] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy (2018) Esrgan: enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0. Cited by: §1, §1, §3.2, §5, §5.
  • [31] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §1.
  • [32] Z. Zhang, M. Wang, and A. Nehorai (2019) Optimal transport in reproducing kernel hilbert spaces: theory and applications. IEEE transactions on pattern analysis and machine intelligence 42 (7), pp. 1741–1754. Cited by: Lemma 1.

Appendix A Background and extensions

a.1 The distortion-perception function

In Sec. 2 of the main text we presented the setting of Euclidean space for simplicity. For the sake of completeness, we present here a more general setup.

Let

be random variables on separable metric spaces

, with joint probability on . Given a distortion function , we aim to find an estimator defined by a conditional distribution (which induces a marginal distribution ), minimizing the expectation under the constraint . Here, is some divergence between probability measures. We further assume the Markov relation , i.e.  are independent given . Similarly to Blau and Michaeli [2018] we define the distortion-perception function

(27)

We can write (27) as

(28)

where we defined . This objective can be written as

(29)

Let us define the cost function

(30)

where we used the fact that is independent of given . Then we have that the objective (29) boils down to .

The problem of finding a perfect perceptual quality estimator can be now written as an optimal transport problem

In the setting where are Euclidean spaces, considering the MSE distortion , we write

and we have

a.2 The optimal transportation problem

Assume are Radon spaces [Ambrosio et al., 2008]. Let be a non-negative Borel cost function, and let be probability measures on respectively. The optimal transport problem is then given in the following formulations.

In the Monge formulation, we search for an optimal transformation, often referred to as an optimal map, minimizing

(31)

Note that the Monge problem seeks for a deterministic map, and might not have a solution.

In the Kantorovich formulation, we wish to find a probability measure on , minimizing

(32)

is the set of probabilities on with marginals . A probability minimizing (32) is called an optimal plan, and we denote . Note that when and is a metric, taking over (32) yields the Wasserstein distance induced by .

In the case where and is the quadratic cost (and we assume have finite first and second moments), there exists an optimal plan minimizing (32). If is absolutely continuous (w.r.t Lebesgue measure), this plan is given by an optimal map which is the unique solution to (31) [Panaretos and Zemel, 2020, p.5,16].