Denoising Diffusion Restoration Models

01/27/2022
by   Bahjat Kawar, et al.
2

Many interesting tasks in image restoration can be cast as linear inverse problems. A recent family of approaches for solving these problems uses stochastic algorithms that sample from the posterior distribution of natural images given the measurements. However, efficient solutions often require problem-specific supervised training to model the posterior, whereas unsupervised methods that are not problem-specific typically rely on inefficient iterative methods. This work addresses these issues by introducing Denoising Diffusion Restoration Models (DDRM), an efficient, unsupervised posterior sampling method. Motivated by variational inference, DDRM takes advantage of a pre-trained denoising diffusion generative model for solving any linear inverse problem. We demonstrate DDRM's versatility on several image datasets for super-resolution, deblurring, inpainting, and colorization under various amounts of measurement noise. DDRM outperforms the current leading unsupervised methods on the diverse ImageNet dataset in reconstruction quality, perceptual quality, and runtime, being 5x faster than the nearest competitor. DDRM also generalizes well for natural images out of the distribution of the observed ImageNet training set.

READ FULL TEXT VIEW PDF

page 8

page 20

page 22

page 23

page 24

page 25

page 26

page 27

09/23/2022

JPEG Artifact Correction using Denoising Diffusion Restoration Models

Diffusion models can be used as learned priors for solving various inver...
10/18/2017

Image Restoration by Iterative Denoising and Backward Projections

Inverse problems appear in many applications such as image deblurring an...
05/27/2020

Image Restoration from Parametric Transformations using Generative Models

When images are statistically described by a generative model we can use...
07/27/2020

Solving Linear Inverse Problems Using the Prior Implicit in a Denoiser

Prior probability models are a central component of many image processin...
05/31/2021

SNIPS: Solving Noisy Inverse Problems Stochastically

In this work we introduce a novel stochastic algorithm dubbed SNIPS, whi...
06/17/2022

Diffusion models as plug-and-play priors

We consider the problem of inferring high-dimensional data 𝐱 in a model ...
11/21/2021

Bilevel learning of l1-regularizers with closed-form gradients(BLORC)

We present a method for supervised learning of sparsity-promoting regula...

1 Introduction

[size=small,height=7.8cm,valign=center] Noiseless Noisy with

(a) Super-resolution

[size=small,valign=center] [size=small,valign=center] [size=small,valign=center]

(b) Deblurring (Noisy with )
(c) Inpainting (Noisy with )
(d) Colorization (Noisy with )
Figure 1: Pairs of measurements and recovered images with a 20-step DDRM on super-resolution, delubrring, inpainting, and colorization, with or without noise, and with unconditional generative models. The images are not accessed during training.

Many problems in image processing, including super-resolution (Ledig et al., 2017; Haris et al., 2018), deblurring (Kupyn et al., 2019; Suin et al., 2020), inpainting (Yeh et al., 2017), colorization (Larsson et al., 2016; Zhang et al., 2016), and compressive sensing (Baraniuk, 2007), are instances of linear inverse problems, where the goal is to recover an image from potentially noisy measurements given through a known linear degradation model. For a specific degradation model, image restoration can be addressed through end-to-end supervised

training of neural networks, using pairs of original and degraded images 

(Dong et al., 2015; Zhang et al., 2016; Saharia et al., 2021a). However, real-world applications such as medical imaging often require flexibility to cope with multiple, possibly infinite, degradation models (Song et al., 2021b). Here, unsupervised approaches, where the degradation model is only known and used during inference, may be more desirable since they can adapt to the given problem without re-training (Venkatakrishnan et al., 2013). By learning sound assumptions over the underlying structure of images (e.g., priors, proximal operators or denoisers), unsupervised approaches can achieve effective restoration without training on specific degradation models (Venkatakrishnan et al., 2013; Romano et al., 2017).

Under this unsupervised setting, priors based on deep neural networks have demonstrated impressive empirical results in various image restoration tasks (Romano et al., 2017; Ulyanov et al., 2018; Santurkar et al., 2019; Pan et al., 2020; Gu et al., 2020). To recover the signal, most existing methods obtain a prior-related term over the signal from a neural network (e.g., the distribution of natural images), and a likelihood term from the degradation model. They combine the two terms to form a posterior over the signal, and the inverse problem can be posed as solving an optimization problem (e.g., maximum a posteriori (Calvetti & Somersalo, 2008; Romano et al., 2017)) or solving a sampling problem (e.g., posterior sampling (Bardsley, 2012; Bardsley et al., 2014; Kawar et al., 2021b)

). Then, these problems are often solved with iterative methods, such as gradient descent or Langevin dynamics, which may be demanding in computation and sensitive to hyperparameter tuning. An extreme example is found in

Laumont et al. (2021) where a “fast” version of the algorithm uses neural function evaluations (NFEs).

Inspired by this unsupervised line of work, we introduce an efficient approach named Denoising Diffusion Restoration Models (DDRM), that can achieve competitive results in as low as NFEs. DDRM is a denoising diffusion generative model (Ho et al., 2020; Song et al., 2021a) that gradually and stochastically denoises a sample to the desired output, conditioned on the measurements and the inverse problem. This way we introduce a variational inference objective for learning the posterior distribution of the inverse problem at hand. We then show its equivalence to the objective of an unconditional denoising diffusion generative model (Ho et al., 2020), which enables us to deploy such models in DDRM for various linear inverse problems (see Figure 2). To our best knowledge, DDRM is the first general inverse problem solver that can efficiently produce a range of high-quality, diverse, yet valid solutions for general content images.

We demonstrate the empirical effectiveness of DDRM by comparing with various competitive unsupervised methods, such as Deep Generative Prior (DGP, Pan et al. (2020)), SNIPS (Kawar et al., 2021b), and Regularization by Denoising (RED, Romano et al. (2017)). On ImageNet examples, DDRM mostly outperforms the neural network baselines under noiseless super-resolution and deblurring measured in PSNR and KID (Bińkowski et al., 2018), and is at least more efficient in terms of NFEs when it is second-best. Our advantage becomes even larger when measurement noise is involved, as noisy artifacts produced by iterative methods do not appear in our case. Over various real-world images, we further show DDRM results on super-resolution, deblurring, inpainting and colorization (see Figure 1). A DDRM trained on ImageNet also works on images that are out of its training set distribution (see Figure 8).

2 Background

Linear Inverse Problems.

A general linear inverse problem is posed as

(1)

where we aim to recover the signal from measurements , where is a known linear degradation matrix, and is an i.i.d.

additive noise with known variance.

The underlying structure of can be represented via a generative model, denoted as . Given and , a posterior over the signal can be posed as: , where the “likelihood” term is defined via Equation 1. Recovering can be done by sampling from this posterior (Bardsley, 2012), which may require many iterations to produce a good sample. Alternatively, one can also approximate this posterior by learning a model via amortized inference (i.e.

, supervised learning); the model learns to predict

given , generated from and a specific . While this can be more efficient than sampling-based methods, it may generalize poorly to inverse problems that have not been trained on.

Denoising Diffusion Probabilistic Models.

Structures learned by generative models have been applied to various inverse problems and often outperform data-independent structural constraints such as sparsity (Bora et al., 2017). These generative models learn a model distribution that approximates a data distribution from samples. In particular, diffusion models have demonstrated impressive unconditional generative modeling performance on images (Dhariwal & Nichol, 2021)

. Diffusion models are generative models with a Markov chain structure

(where

), which has the following joint distribution:

After drawing , only is kept as the sample of the generative model. To train a diffusion model, a fixed, factorized variational inference distribution is introduced:

which leads to an evidence lower bound (ELBO) on the maximum likelihood objective (Song et al., 2021a).

A special property of some diffusion models is that both and

are chosen as conditional Gaussian distributions for all

, and that is also a Gaussian with known mean and covariance, i.e., can be treated as

directly corrupted with Gaussian noise. Thus, the ELBO objective can be reduced into the following denoising autoencoder objective (please refer to

Song et al. (2021a) for derivations):

(2)

where is a -parametrized neural network that aims to recover a noiseless observation from a noisy , and are a set of positive coefficients that depend on .

3 Denoising Diffusion Restoration Models

Figure 2: Illustration of our DDRM method for a specific inverse problem (super-resolution + denoising). We can use unsupervised DDPM models as a good solution to the DDRM objective.

Inverse problem solvers based on posterior sampling often face a dilemma: unsupervised approaches apply to general problems but are inefficient, whereas supervised ones are efficient but can only address specific problems.

To solve this dilemma, we introduce Denoising Diffusion Restoration Models (DDRM), an unsupervised solver for general linear inverse problems, capable of handling such tasks with or without noise in the measurements. DDRM is efficient and exhibits competitive performance compared to popular unsupervised solvers (Romano et al., 2017; Pan et al., 2020; Kawar et al., 2021b).

The key idea behind DDRM is to find an unsupervised solution that also suits supervised learning objectives. First, we describe the variational objective for DDRM over a specific inverse problem (Section 3.1). Next, we introduce specific forms of DDRM that are suitable for linear inverse problems and allow pre-trained unconditional and class-conditional diffusion models to be used directly (Sections 3.23.3). Finally, we discuss practical algorithms that are compute and memory efficient (Section 3.4).

3.1 Variational Objective for DDRM

For any linear inverse problem, we define DDRM as a Markov chain conditioned on , where

and is the final diffusion output. In order to perform inference, we consider the following factorized variational distribution conditioned on :

leading to an ELBO objective for diffsuion models conditioned on (details in Appendix A).

In the remainder of the section, we construct suitable variational problems given and and connect them to unconditional diffusion generative models. To simplify notations, we will construct the variational distribution such that for noise levels .111This is called “Variance Exploding” in (Song et al., 2021c). In Appendix B, we will show that this is equivalent to the distribution introduced in DDPM (Ho et al., 2020) and DDIM (Song et al., 2021a),222This is called “Variance Preserving” in (Song et al., 2021c).

up to fixed linear transformations over

.

3.2 A Diffusion Process for Image Restoration

Similar to SNIPS (Kawar et al., 2021b)

, we consider the singular value decomposition (SVD) of

, and perform the diffusion in its spectral space. The idea behind this is to tie the noise present in the measurements with the diffusion noise in , ensuring that the diffusion result is faithful to the measurements. By using the SVD, we identify the data from that is missing in , and synthesize it using a diffusion process. In conjunction, the noisy data in undergoes a denoising process. For example, in inpainting with noise (e.g., , ), the spectral space is simply the pixel space, so the model should generate the missing pixels and denoise the observed ones in . For a general linear , its SVD is given as

(3)

where , are orthogonal matrices, and

is a rectangular diagonal matrix containing the singular values of

, ordered descendingly. As this is the case in most useful degradation models, we assume , but our method would work for as well. We denote the singular values as , and define for .

We use the shorthand notations for values in the spectral space: is the

-th index of the vector

, and is the -th index of the vector (where denotes the Moore–Penrose pseudo-inverse). Because

is an orthogonal matrix, we can recover

from exactly by left multiplying . For each index in , we define the variational distribution as:

(4)
(5)

where is a hyperparameter controlling the variance of the transitions, and and may depend on . We further assume that for all positive .333This assumption is fair, as we can set a sufficiently large .

In the following statement, we show that this construction has the “Gaussian marginals” property similar to the inference distribution used in unconditional diffusion models (Ho et al., 2020).

Proposition 3.1.

The conditional distributions defined in Equations 4 and 5 satisfy the following:

(6)

defined by marginalizing over (for all ) and , where is defined as in Equation 1 with .

We place the proof in Appendix C. Intuitively, our construction considers different cases for each index of the spectral space. (i) If the corresponding singular value is zero, then does not directly provide any information to that index, and the update is similar to regular unconditional generation. (ii) If the singular value is non-zero, then the updates consider the information provided by , which further depends on whether the measurements’ noise level in the spectral space () is larger than the noise level in the diffusion model () or not; the measurements in the spectral space are then scaled differently for these two cases in order to ensure 3.1 holds.

Now that we have defined as a series of Gaussian conditionals, we define our model distribution as a series of Gaussian conditionals as well. Similar to DDPM, we aim to obtain predictions of at every step ; and to simplify notations, we use the symbol to represent this prediction made by a model444Equivalently, Ho et al. (2020) predict the noise values to subtract in order to recover . that takes in the sample and the conditioned time step . We also define as the -th index of .

We define DDRM with trainable parameters as follows:

(7)
(8)

Compared to in Equations 4 and 5, our definition of merely replaces (which we do not know at sampling) with (which depends on our predicted ) when , and replaces with when . It is possible to learn the variances (Nichol & Dhariwal, 2021) or consider alternative constructions where 3.1 holds; we leave these options as future work.

3.3 “Learning” Image Restoration Models

Once we have defined and by choosing , and , we can learn model parameters by maximizing the resulting ELBO objective (in Appendix, Equation 10). However, this approach is not desirable since we have to learn a different model for each inverse problem (given and ), which is not flexible enough for arbitrary inverse problems. Fortunately, this does not have to be the case. In the following statement, we show that an optimal solution to DDPM / DDIM can also be an optimal solution to a DDRM problem, under reasonable assumptions used in prior work (Ho et al., 2020; Song et al., 2021a).

Theorem 3.2.

Assume that the models and are independent whenever , then when and , the ELBO objective of DDRM (details in Equation 10) can be rewritten in the form of the DDPM / DDIM objective in Equation 2.

We place the proof in Appendix C.

Even for different choices of and , the proof shows that the DDRM objective is a weighted sum-of-squares error in the spectral space, and thus pre-trained DDPM models are good approximations to the optimal solution. Therefore, we can apply the same diffusion model (unconditioned on the inverse problem) using the updates in Equation 7 and Equation 8 and only modify and its SVD (, , ) for various linear inverse problems.

3.4 Accelerated Algorithms for DDRM

Typical diffusion models are trained with many timesteps (e.g., 1000) to achieve optimal unconditional image synthesis quality, but sampling speed is slow as many NFEs are required. Previous works (Song et al., 2021a; Dhariwal & Nichol, 2021)

have accelerated this process by “skipping” steps with appropriate update rules. This is also true for DDRM, since we can obtain the denoising autoencoder objective in

Equation 2 for any choice of increasing . For a pre-trained diffusion model with timesteps, we can choose to be a subset of the steps used in training.

3.5 Memory Efficient SVD

Our method, similar to (Kawar et al., 2021b), utilizes the SVD of the degradation operator . This constitutes a memory consumption bottleneck in both algorithms, as storing the matrix has a space complexity of for signals of size . By leveraging special properties of the matrices used, we can reduce this complexity to for denoising, inpainting, super resolution, deblurring, and colorization. The detailed analyses are shown in Appendix D.

4 Related Work

Various deep learning solutions have been suggested for solving inverse problems under different settings (see a detailed survey by

Ongie et al. (2020)). We focus on the unsupervised setting, where we have access to a dataset of clean images at training time, but the degradation model is known only at inference time. This setup is inherently general to all linear inverse problems, a property desired in many real-world applications such as medical imaging (Song et al., 2021b; Jalal et al., 2021a).

Almost all unsupervised inverse problem solvers utilize a trained neural network in an iterative scheme. PnP, RED, and their successors (Venkatakrishnan et al., 2013; Romano et al., 2017; Mataev et al., 2019; Sun et al., 2019) apply a denoiser as part of an iterative optimization algorithm such as steepest descent, fixed-point, or alternating direction method of multipliers (ADMM). OneNet (Rick Chang et al., 2017) trained a network to directly learn the proximal operator of ADMM. A similar use of denoisers in different iterative algorithms is proposed in (Metzler et al., 2017; Guo et al., 2019; Laumont et al., 2021). Santurkar et al. (2019)

leverages robust classifiers learned with additional class labels.

Another approach is to search the latent space of a generative model for a generated image that, when degraded, is as close as possible to the given measurements. Multiple such methods were suggested, mainly focusing on generative adversarial networks (GANs)

(Bora et al., 2017; Daras et al., 2021; Menon et al., 2020). While they exhibit impressive results on images of a specific class, most notably face images, these methods are not shown to be largely successful when considering a more diverse dataset such as ImageNet (Deng et al., 2009). Deep Generative Prior (DGP) mitigates this issue by optimizing the latent input as well as the weights of the GAN’s generator (Pan et al., 2020).

More recently, denoising diffusion models were used to solve inverse problems in both supervised (i.e., degradation model is known during training) (Saharia et al., 2021b, a; Dhariwal & Nichol, 2021; Chung et al., 2021; Whang et al., 2021) and unsupervised settings (Kadkhodaie & Simoncelli, 2021; Kawar et al., 2021a, b; Jalal et al., 2021b; Song et al., 2021b, c; Choi et al., 2021). Unlike previous approaches, diffusion-based methods can successfully recover images from measurements with significant noise. However, these methods are very slow, often requiring hundreds or thousands of iterations, and are yet to be proven on diverse datasets. Our method, motivated by variational inference, obtains problem-specific, non-equilibrium update rules that lead to high-quality solutions in much fewer iterations.

ILVR (Choi et al., 2021) suggests a diffusion-based method that handles noiseless super-resolution, and can run in steps. In Appendix G, we prove that when applied on the same underlying generative diffusion model, ILVR is a special case of DDRM. Therefore, ILVR can be further accelerated to run in steps, but unlike DDRM, it provides no clear way of handling noise in the measurements.

5 Experiments


Method PSNR KID NFEs
Bicubic
DGP
RED
SNIPS
DDRM
DDRM-CC
Table 1: Noiseless super-resolution results on ImageNet 1K ().

Method PSNR KID NFEs
Blurry
DGP
RED
SNIPS
DDRM
DDRM-CC
Table 2: Noiseless deblurring results on ImageNet 1K ().

5.1 Experimental Setup

We demonstrate our algorithm’s capabilities using the diffusion models from (Ho et al., 2020), which are trained on CelebA-HQ (Karras et al., 2018), LSUN bedrooms, and LSUN cats (Yu et al., 2015) (all pixels). We test these models on images from FFHQ (Karras et al., 2019), and pictures from the internet of the considered LSUN category, respectively. In addition, we use the models from (Dhariwal & Nichol, 2021), trained on the training set of ImageNet and , and tested on the corresponding validation set. Some of the ImageNet models require class information. For these models, we use the ground truth labels as input, and denote our algorithm as DDRM class conditional (DDRM-CC). In all experiments, we use , , and a uniformly-spaced timestep schedule based on the 1000-step pre-trained models (more details in Appendix E). The number of NFEs (timesteps) is reported in each experiment.

In each of the inverse problems we show, pixel values are in the range , and the degraded measurements are obtained as follows: (i) for super-resolution, we use a block averaging filter to downscale the images by a factor of , , or in each axis; (ii) for deblurring, the images are blurred by a uniform kernel; (iii) for colorization, the grayscale image is an average of the red, green, and blue channels of the original image; (iv) and for inpainting, we mask parts of the original image with text overlay or randomly drop of the pixels. Additive white Gaussian noise can optionally be added to the measurements in all inverse problems.


Method PSNR KID NFEs
Bicubic
DGP
RED
SNIPS
DDRM
DDRM-CC
Table 3: super resolution results on ImageNet 1K (). Low-res images have an additive noise of .

Method PSNR KID NFEs
Blurry
DGP
RED
SNIPS
DDRM
DDRM-CC
Table 4: Deblurring results on ImageNet 1K (). Blurred images have an additive noise of .
Original Low-res DDRM ()
SNIPS RED DGP
Figure 3: noisy super resolution comparison with .
Original Occluded DDRM ()
Figure 4: Inpainting results on cat images. First two images have of their pixels removed, last two are occluded by text.
row2¡ 6
Original Blurred DDRM ()
Figure 5:

Deblurring results on bedroom images. Blurred images contain noise of standard deviation

.
Original Noisy DDRM () Denoised
Figure 6: Denoising () face images. DDRM restores more fine details (e.g. hair) than an MMSE denoiser.

‘teapot’

   row0¡ 4

‘brown bear’

   row0¡ 4
Original Grayscale Samples from DDRM-CC ()
Figure 7: ImageNet colorization. DDRM-CC produces various samples for multiple runs on the same input.
row0¡ 6
Original Blurred DDRM (20)
Figure 8: Results on USC-SIPI images using an ImageNet model. Blurred images have a noise of .

5.2 Quantitative Experiments

In order to quantify DDRM’s performance, we focus on the ImageNet dataset (

) for its diversity. For each experiment, we report the average peak signal-to-noise ratio (PSNR) to measure faithfulness to the original image, and the kernel Inception distance (KID)

(Bińkowski et al., 2018), multiplied by , to measure the resulting image quality.

We compare DDRM (with and steps) with other unsupervised methods that work in reasonable time (requiring NFEs or less) and can operate on ImageNet. Namely, we compare with RED (Romano et al., 2017), DGP (Pan et al., 2020), and SNIPS (Kawar et al., 2021b). The exact setup of each method is detailed in Appendix F

. We used the same hyperparameters for noisy and noiseless versions of the same problem for DGP, RED, and SNIPS, as tuning them for each version would compromise their unsupervised nature. In addition, we show upscaling by bicubic interpolation as a baseline for super-resolution, and the blurry image itself as a baseline for deblurring. OneNet

(Rick Chang et al., 2017) is not included in the comparisons as it is limited to images of size , and generalization to higher dimensions requires an improved network architecture.

We evaluate all methods on the problems of super-resolution and deblurring, on one validation set image from each of the ImageNet classes, following (Pan et al., 2020). Tables 1 and 2 show that DDRM outperforms all baseline methods, in all metrics, and on both problems with only steps. The only exception to this is that SNIPS achieves better KID than DDRM in noiseless deblurring, but it requires more NFEs to do so. DGP and DDRM-CC use ground-truth class labels for the test images to aid in the restoration process, and thus have an unfair advantage.

DDRM’s appeal compared to previous methods becomes more substantial when significant noise is added to the measurements. Under this setting, DGP, RED, and SNIPS all fail to produce viable results, as evident in Tables 3 and 4 and Figure 3. Since DDRM is fast, we also evaluate it on the entire ImageNet validation set in Appendix F.

5.3 Qualitative Experiments

DDRM produces high quality reconstructions across all the tested datasets and problems, as can be seen in Figures 1, 4, 5, 6, and in Appendix H. The denoiser used in Figure 6 is the denoising diffusion function used by DDRM, where minimizes . As it is a posterior sampling algorithm, DDRM can produce multiple outputs for the same input, as demonstrated in Figure 7. Moreover, the unconditional ImageNet diffusion models can be used to solve inverse problems on out-of-distribution images with general content. In Figure 8, we show DDRM successfully restoring images from USC-SIPI (Weber, 1997) that do not necessarily belong to any ImageNet class (additional results in Figure 14, Appendix H).

6 Conclusions

We have introduced DDRM, a general linear inverse problem solver based on unconditional/class-conditional diffusion generative models. Motivated by variational inference, DDRM only requires a few number of NFEs (e.g., 20) compared to other baselines (e.g., 1000 for SNIPS) and achieves scalability in multiple useful scenarios, including denoising, super-resolution, deblurring, inpainting, and colorization. We demonstrate the empirical successes of DDRM on various problems and datasets, including general natural images outside the distribution of the observed training set. To our best knowledge, DDRM is the first unsupervised method that effectively and efficiently samples from the posterior distribution of inverse problems with significant noise, and can work on natural images with general content.

In terms of future work, apart from further optimizing the timestep and variance schedules, it would be interesting to investigate the following: (i) applying DDRM to non-linear inverse problems, (ii) addressing scenarios where the degredation operator is unknown, and (iii) self-supervised training techniques inspired by DDRM as well as ones used in supervised techniques (Saharia et al., 2021a) that further improve performance of unsupervised models for image restoration.

References

  • Baraniuk (2007) Baraniuk, R. G. Compressive sensing [lecture notes]. IEEE signal processing magazine, 24(4):118–121, 2007.
  • Bardsley (2012) Bardsley, J. M. Mcmc-based image reconstruction with uncertainty quantification. SIAM Journal on Scientific Computing, 34(3):A1316–A1332, 2012.
  • Bardsley et al. (2014) Bardsley, J. M., Solonen, A., Haario, H., and Laine, M. Randomize-then-optimize: A method for sampling from posterior distributions in nonlinear inverse problems. SIAM Journal on Scientific Computing, 36(4):A1895–A1910, 2014.
  • Bishop (2006) Bishop, C. M. Pattern recognition. Machine learning, 128(9), 2006.
  • Bińkowski et al. (2018) Bińkowski, M., Sutherland, D. J., Arbel, M., and Gretton, A. Demystifying MMD GANs. In International Conference on Learning Representations, 2018.
  • Blau & Michaeli (2018) Blau, Y. and Michaeli, T. The perception-distortion tradeoff. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pp. 6228–6237, 2018.
  • Bora et al. (2017) Bora, A., Jalal, A., Price, E., and Dimakis, A. G. Compressed sensing using generative models. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp. 537–546, 2017.
  • Calvetti & Somersalo (2008) Calvetti, D. and Somersalo, E. Hypermodels in the bayesian imaging framework. Inverse Problems, 24(3):034013, 2008.
  • Choi et al. (2021) Choi, J., Kim, S., Jeong, Y., Gwon, Y., and Yoon, S. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021.
  • Chung et al. (2021) Chung, H., Sim, B., and Ye, J. C. Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. arXiv preprint arXiv:2112.05146, 2021.
  • Daras et al. (2021) Daras, G., Dean, J., Jalal, A., and Dimakis, A. Intermediate layer optimization for inverse problems using deep generative models. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp. 2421–2432, 2021.
  • Deng et al. (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
  • Dhariwal & Nichol (2021) Dhariwal, P. and Nichol, A. Q. Diffusion models beat GANs on image synthesis. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
  • Dong et al. (2015) Dong, C., Loy, C. C., He, K., and Tang, X. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  • Gu et al. (2020) Gu, J., Shen, Y., and Zhou, B. Image processing using multi-code gan prior. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3012–3021, 2020.
  • Guo et al. (2019) Guo, B., Han, Y., and Wen, J. Agem: Solving linear inverse problems via deep priors and sampling. Advances in Neural Information Processing Systems, 32, 2019.
  • Haris et al. (2018) Haris, M., Shakhnarovich, G., and Ukita, N. Deep back-projection networks for super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1664–1673, 2018.
  • Heusel et al. (2017) Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, volume 30, 2017.
  • Ho et al. (2020) Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851, 2020.
  • Jalal et al. (2021a) Jalal, A., Arvinte, M., Daras, G., Price, E., Dimakis, A., and Tamir, J. Robust compressed sensing mri with deep generative priors. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021a.
  • Jalal et al. (2021b) Jalal, A., Karmalkar, S., Dimakis, A., and Price, E. Instance-optimal compressed sensing via posterior sampling. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp. 4709–4720, 2021b.
  • Kadkhodaie & Simoncelli (2021) Kadkhodaie, Z. and Simoncelli, E. Stochastic solutions for linear inverse problems using the prior implicit in a denoiser. Advances in Neural Information Processing Systems, 34, 2021.
  • Karras et al. (2018) Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
  • Karras et al. (2019) Karras, T., Laine, S., and Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
  • Kawar et al. (2021a) Kawar, B., Vaksman, G., and Elad, M. Stochastic image denoising by sampling from the posterior distribution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 1866–1875, October 2021a.
  • Kawar et al. (2021b) Kawar, B., Vaksman, G., and Elad, M. SNIPS: Solving noisy inverse problems stochastically. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021b.
  • Kingma & Welling (2013) Kingma, D. P. and Welling, M. Auto-Encoding variational bayes. arXiv preprint arXiv:1312.6114v10, December 2013.
  • Kupyn et al. (2019) Kupyn, O., Martyniuk, T., Wu, J., and Wang, Z. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8878–8887, 2019.
  • Larsson et al. (2016) Larsson, G., Maire, M., and Shakhnarovich, G. Learning representations for automatic colorization. In European conference on computer vision, pp. 577–593. Springer, 2016.
  • Laumont et al. (2021) Laumont, R., De Bortoli, V., Almansa, A., Delon, J., Durmus, A., and Pereyra, M. Bayesian imaging using plug & play priors: When Langevin meets Tweedie. arXiv preprint arXiv:2103.04715, 2021.
  • Ledig et al. (2017) Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
  • Mataev et al. (2019) Mataev, G., Milanfar, P., and Elad, M. DeepRED: deep image prior powered by RED. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.
  • Menon et al. (2020) Menon, S., Damian, A., Hu, M., Ravi, N., and Rudin, C. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  • Metzler et al. (2017) Metzler, C., Mousavi, A., and Baraniuk, R. Learned d-amp: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems, volume 30, 2017.
  • Nichol & Dhariwal (2021) Nichol, A. and Dhariwal, P. Improved denoising diffusion probabilistic models. arXiv preprint arXiv:2102.09672, 2021.
  • Ongie et al. (2020) Ongie, G., Jalal, A., Metzler, C. A., Baraniuk, R. G., Dimakis, A. G., and Willett, R. Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, 2020.
  • Pan et al. (2020) Pan, X., Zhan, X., Dai, B., Lin, D., Loy, C. C., and Luo, P. Exploiting deep generative prior for versatile image restoration and manipulation. In European Conference on Computer Vision (ECCV), 2020.
  • Rick Chang et al. (2017) Rick Chang, J., Li, C.-L., Poczos, B., Vijaya Kumar, B., and Sankaranarayanan, A. C. One network to solve them all–solving linear inverse problems using deep projection models. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5888–5897, 2017.
  • Romano et al. (2017) Romano, Y., Elad, M., and Milanfar, P. The little engine that could: Regularization by denoising (RED). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017.
  • Saharia et al. (2021a) Saharia, C., Chan, W., Chang, H., Lee, C. A., Ho, J., Salimans, T., Fleet, D. J., and Norouzi, M. Palette: Image-to-image diffusion models. arXiv preprint arXiv:2111.05826, 2021a.
  • Saharia et al. (2021b) Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D. J., and Norouzi, M. Image super-resolution via iterative refinement. arXiv preprint arXiv:2104.07636, 2021b.
  • Santurkar et al. (2019) Santurkar, S., Tsipras, D., Tran, B., Ilyas, A., Engstrom, L., and Madry, A. Image synthesis with a single (robust) classifier. arXiv preprint arXiv:1906.09453, 2019.
  • Song et al. (2021a) Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, April 2021a.
  • Song et al. (2021b) Song, Y., Shen, L., Xing, L., and Ermon, S. Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005, 2021b.
  • Song et al. (2021c) Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021c.
  • Suin et al. (2020) Suin, M., Purohit, K., and Rajagopalan, A. Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3606–3615, 2020.
  • Sun et al. (2019) Sun, Y., Wohlberg, B., and Kamilov, U. S. An online plug-and-play algorithm for regularized image reconstruction. IEEE Transactions on Computational Imaging, 5(3):395–408, 2019.
  • Ulyanov et al. (2018) Ulyanov, D., Vedaldi, A., and Lempitsky, V. Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018.
  • Venkatakrishnan et al. (2013) Venkatakrishnan, S. V., Bouman, C. A., and Wohlberg, B. Plug-and-play priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 945–948. IEEE, 2013.
  • Weber (1997) Weber, A. G. The USC-SIPI image database version 5. USC-SIPI Report, 315(1), 1997.
  • Whang et al. (2021) Whang, J., Delbracio, M., Talebi, H., Saharia, C., Dimakis, A. G., and Milanfar, P. Deblurring via stochastic refinement. arXiv preprint arXiv:2112.02475, 2021.
  • Yeh et al. (2017) Yeh, R. A., Chen, C., Yian Lim, T., Schwing, A. G., Hasegawa-Johnson, M., and Do, M. N.

    Semantic image inpainting with deep generative models.

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5485–5493, 2017.
  • Yu et al. (2015) Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., and Xiao, J. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  • Zhang et al. (2016) Zhang, R., Isola, P., and Efros, A. A. Colorful image colorization. In European Conference on Computer Vision, pp. 649–666. Springer, 2016.

Appendix A Details of the DDRM ELBO objective

DDRM is a Markov chain conditioned on , which would lead to the following ELBO objective (Song et al., 2021a):

(9)
(10)

where is the data distribution, follows Equation 1, the expectation on the right hand side is given by sampling , , , and for .

Appendix B Equivalence between “Variance Preserving” and “Variance Exploding” Diffusion Models

In our main paper, we describe our methods based on the “Variance Exploding” hyperparameters , where and

(11)

In DDIM (Song et al., 2021a), the hyperparameters are “Variance Preserving” ones , where and

(12)

We use the colored notation to emphasize that this is different from (an exception is ). Using the reparametrization trick, we have that:

(13)
(14)

where . We can divide by in both sides of Equation 13:

(15)

Let , and let ; then from Equation 15 we have that

(16)

which is equivalent to the “Variance Preserving” case. Therefore, we can use “Variance Preserving” models, such as DDPM, directly in our DDRM updates, even though the latter uses the “Variance Exploding” parametrization:

  1. From , obtain predictions and .

  2. From and , apply DDRM updates to get .

  3. From , get .

Note that although the inference algorithms are shown to be equivalent, the choice between ”Variance Preserving” and ”Variance Exploding” may affect the training of diffusion networks.

Appendix C Proofs

See 3.1

Proof.

The proof uses a basic property of Gaussian marginals (see (Bishop, 2006) for the complete version).

  1. If , , then .

  2. If and , then .

First, we note that is defined from Equation 1, and thus for all :

(17)

Case I For , it is obvious when . When , we have Equation 17 and that:

(18)

and thus

Case II For any and such that and , we have Equation 17 and that:

(19)

and thus we can safely remove the dependence on via marginalization. is a Gaussian with the mean being and variance being