1 Introduction
Many modern image processing algorithms recover or denoise an image through the optimization problem
where the optimization variable represents the image, measures data fidelity, measures noisiness or complexity of the image, and is a parameter representing the relative importance between and . Total variation denoising, inpainting, and compressed sensing fall under this setup. A priori knowledge of the image, such as that the image should have small noise, is encoded in . So is small if has small noise or complexity. A posteriori knowledge of the image, such as noisy or partial measurements of the image, is encoded in . So is small if agrees with the measurements.
Firstorder iterative methods are often used to solve such optimization problems, and ADMM is one such method:
with . Given a function on and , define the proximal operator of as
which is welldefined if is proper, closed, and convex. Now we can equivalently write ADMM as
We can interpret the subroutine as a denoiser, i.e.,
(For example, if is the noise level and is the total variation (TV) norm, then is the standard Rudin–Osher–Fatemi (ROF) model (Rudin et al., 1992).) We can think of as a mapping enforcing consistency with measured data, i.e.,
More precisely speaking, for any we have
However, some stateoftheart image denoisers with great empirical performance do not originate from optimization problems. Such examples include nonlocal means (NLM) (Buades et al., 2005), Blockmatching and 3D filtering (BM3D) (Dabov et al., 2007)
, and convolutional neural networks (CNN)
(Zhang et al., 2017a). Nevertheless, such a denoiser still has the interpretationwhere is a noise parameter. Larger values of correspond to more aggressive denoising.
Is it possible to use such denoisers for a broader range of imaging problems, even though we cannot directly set up an optimization problem? To address this question, (Venkatakrishnan et al., 2013) proposed PlugandPlay ADMM (PnPADMM), which simply replaces the proximal operator with the denoiser :
Surprisingly and remarkably, this adhoc method exhibited great empirical success, and spurred much followup work.
Contribution of this paper.
The empirical success of PlugandPlay (PnP) naturally leads us to ask theoretical questions: When does PnP converge and what denoisers can we use? Past theoretical analysis has been insufficient.
The main contribution of this work is the convergence analyses of PnP methods. We study two Plugandplay methods, Plugandplay forwardbackward splitting (PNPFBS) and PNPADMM. For the analysis, we assume the denoiser satisfies a certain Lipschitz condition, formally defined as Assumption (A). Roughly speaking, the condition corresponds to the denoiser being close to the identity map, which is reasonable when the denoising parameter is small. In particular, we do not assume that is nonexpansive or differentiable since most denoisers do not have such properties. Under the assumption, we show that the PnP methods are contractive iterations.
We then propose real spectral normalization (realSN), a technique based on (Miyato et al., 2018) for more accurately constraining deep learningbased denoisers in their training to satisfy the proposed Lipschitz condition. Finally, we present experimental results validating our theory. Code used for experiments is available at: https://github.com/uclaopt/Provable_Plug_and_Play/
1.1 Prior work
Plugandplay: Practice.
The first PnP method was the Plugandplay ADMM proposed in (Venkatakrishnan et al., 2013). Since then, other schemes such as the primaldual method (Heide et al., 2014; Meinhardt et al., 2017; Ono, 2017), ADMM with increasing penalty parameter (Brifman et al., 2016; Chan et al., 2017), generalized approximate message passing (Metzler et al., 2016), Newton iteration (Buzzard et al., 2018), Fast Iterative ShrinkageThresholding Algorithm (Kamilov et al., 2017; Sun et al., 2018b), (stochastic) forwardbackward splitting (Sun et al., 2019, 2018a, 2018b), and alternating minimization (Dong et al., 2018) have been combined with the PnP technique.
PnP method reported empirical success on a large variety of imaging applications: bright field electron tomography (Sreehari et al., 2016), camera image processing (Heide et al., 2014), compressionartifact reduction (Dar et al., 2016), compressive imaging (Teodoro et al., 2016), deblurring (Teodoro et al., 2016; Rond et al., 2016; Wang & Chan, 2017), electron microscopy (Sreehari et al., 2017), Gaussian denoising (Buzzard et al., 2018; Dong et al., 2018), nonlinear inverse scattering (Kamilov et al., 2017), Poisson denoising (Rond et al., 2016), singlephoton imaging (Chan et al., 2017)
(Brifman et al., 2016; Sreehari et al., 2016; Chan et al., 2017), diffraction tomography (Sun et al., 2019), Fourier ptychographic microscopy (Sun et al., 2018b), lowdose CT imaging (Venkatakrishnan et al., 2013; He et al., 2018; Ye et al., 2018; Lyu et al., 2019), hyperspectral sharpening (Teodoro et al., 2017, 2019), inpainting (Chan, 2019; Tirer & Giryes, 2019), and superresolution (Dong et al., 2018).A wide range of denoisers have been used for the PnP framework. BM3D has been used the most (Heide et al., 2014; Dar et al., 2016; Rond et al., 2016; Sreehari et al., 2016; Chan et al., 2017; Kamilov et al., 2017; Ono, 2017; Wang & Chan, 2017), but other denoisers such as sparse representation (Brifman et al., 2016), nonlocal means (Venkatakrishnan et al., 2013; Heide et al., 2014; Sreehari et al., 2016, 2017; Chan, 2019)
(Teodoro et al., 2016, 2017; Shi & Feng, 2018; Teodoro et al., 2019), Patchbased Wiener filtering (Venkatakrishnan et al., 2013), nuclear norm minimization (Kamilov et al., 2017), deep learningbased denoisers (Meinhardt et al., 2017; He et al., 2018; Ye et al., 2018; Tirer & Giryes, 2019) and deep projection model based on generative adversarial networks (Chang et al., 2017) have also been considered.Plugandplay: Theory.
Compared to the empirical success, much less progress was made on the theoretical aspects of PnP optimization. (Chan et al., 2017) analyzed convergence with a bounded denoiser assumption, establishing convergence using an increasing penalty parameter. (Buzzard et al., 2018) provided an interpretation of fixed points via “consensus equilibrium”. (Sreehari et al., 2016; Sun et al., 2019; Teodoro et al., 2017; Chan, 2019; Teodoro et al., 2019) proved convergence of PNPADMM and PNPFBS with the assumption that the denoiser is (averaged) nonexpansive by viewing the methods to be fixedpoint iterations. The nonexpansiveness assumption is not met with most denoisers as is, but (Chan, 2019) proposed modifications to the nonlocal means and Gaussian mixture model denoisers, which make them into linear filters, to enforce nonexpansiveness. (Dong et al., 2018) presented a proof that relies on the existence of a certain Lyapunov function that is monotonic under , which holds only for simple . (Tirer & Giryes, 2019) analyzed a variant of PnP, but did not establish local convergence since their key assumption is only expected to be satisfied “in early iterations”.
Other PnPtype methods.
There are other lines of works that incorporate modern denoisers into modelbased optimization methods. The plugin idea with half quadratic splitting, as opposed to ADMM, was discussed (Zoran & Weiss, 2011) and this approach was carried out with deep learningbased denoisers in (Zhang et al., 2017b). (Danielyan et al., 2012; Egiazarian & Katkovnik, 2015) use the notion of Nash equilibrium to propose a scheme similar to PnP. (Danielyan et al., 2010) proposed an augmented Lagrangian method similar to PnP. (Romano et al., 2017; Reehorst & Schniter, 2019) presented Regularization by Denoising (RED), which uses the (nonconvex) regularizer given a denoiser , and use denoiser evaluations in its iterations. (Fletcher et al., 2018)
applies the plugin approach to vector approximate message passing.
(Yang et al., 2016; Fan et al., 2019) replaced both the proximal operator enforcing data fidelity and the denoiser with two neural networks and performed endtoend training. Broadly, there are more works that incorporate modelbased optimization with deep learning (Chen et al., 2018; Liu et al., 2019).Image denoising using deep learning.
Deep learningbased denoising methods have become stateoftheart. (Zhang et al., 2017a)
proposed an effective denoising network called DnCNN, which adopted batch normalization
(Ioffe & Szegedy, 2015)and ReLU
(Krizhevsky et al., 2012) into the residual learning (He et al., 2016). Other represenative deep denoising models include the deep convolutional encoderdecoder with symmetric skip connection (Mao et al., 2016), Net (Plötz & Roth, 2018), and MWCNN (Liu et al., 2018). The recent FFDNet (Zhang et al., 2018) handles spatially varying Gaussian noise.Regularizing Lipschitz continuity.
Lipschitz continuity and its variants have started to receive attention as a means for regularizing deep classifiers
(Bartlett et al., 2017; Bansal et al., 2018; Oberman & Calder, 2018) and GANs (Miyato et al., 2018; Brock et al., 2019). Regularizing Lipschitz continuity stabilizes training, improves the final performance, and enhances robustness to adversarial attacks (Weng et al., 2018; Qian & Wegman, 2019). Specifically, (Miyato et al., 2018) proposed to normalize all weights to be of unit spectral norms to thereby constrain the Lipschitz constant of the overall network to be no more than one.2 PNPFBS/ADMM and their fixed points
We now present the PnP methods we investigate in this work. We quickly note that although PNPFBS and PNPADMM are distinct methods, they share the same fixed points by Remark 3.1 of (Meinhardt et al., 2017) and Proposition 3 of (Sun et al., 2019).
We call the method
(PNPFBS) 
for any , plugandplay forwardbackward splitting (PNPFBS) or plugandplay proximal gradient method.
We interpret PNPFBS as a fixedpoint iteration, and we say is a fixed point of PNPFBS if
Fixed points of PNPFBS have a simple, albeit nonrigorous, interpretation. An image denoising algorithm must trade off the two goals of making the image agree with measurements and making the image less noisy. PNPFBS applies and , each promoting such objectives, repeatedly in an alternating fashion. If PNPFBS converges to a fixed point, we can expect the limit to represent a compromise.
We call the method
(PNPADMM)  
for any , plugandplay alternating directions method of multipliers (PNPADMM). We interpret PNPADMM as a fixedpoint iteration, and we say is a fixed point of PNPADMM if
If we let and in (PNPADMM), then we get and . We call the method
(PNPDRS)  
plugandplay Douglas–Rachford splitting (PNPDRS). We interpret PNPDRS as a fixedpoint iteration, and we say is a fixed point of PNPDRS if
PNPADMM and PNPDRS are equivalent. Although this is not surprising as the equivalence between convex ADMM and DRS is well known, we show the steps establishing equivalence in the supplementary document.
We introduce PNPDRS as an analytical tool for analyzing PNPADMM. It is straightforward to verify that PNPDRS can be written as , where
We use this form to analyze the convergence of PNPDRS and translate the result to PNPADMM.
3 Convergence via contraction
We now present conditions that ensure the PnP methods are contractive and thereby convergent.
If we assume is nonexpansive, standard tools of monotone operator theory tell us that PnPADMM converges. However, this assumption is too strong. Chan et al. presented a counter example demonstrating that is not nonexpansive for the NLM denoiser (Chan et al., 2017).
Rather, we assume satisfies
(A) 
for all for some . Since controls the strength of the denoising, we can expect to be close to identity for small . If so , Assumption (A) is reasonable.
Under this assumption, we show that the PNPFBS and PNPDRS iterations are contractive in the sense that we can express the iterations as , where satisfies
for all for some . We call the contraction factor. If satisfies , i.e., is a fixed point, then geometrically by the classical Banach contraction principle.
Theorem 1 (Convergence of PNPFBS).
Assume satisfies assumption (A) for some . Assume is strongly convex, is differentiable, and is Lipschitz. Then
satisfies
for all . The coefficient is less than if
Such an exists if .
Theorem 2 (Convergence of PNPDRS).
Assume satisfies assumption (A) for some . Assume is strongly convex and differentiable. Then
satisfies
for all . The coefficient is less than if
Corollary 3 (Convergence of PNPADMM).
Assume satisfies assumption (A) for some . Assume is strongly convex. Then PNPADMM converges for
Proof.
This follows from Theorem 2 and the equivalence of PNPDRS and PNPADMM. ∎
For PNPFBS, we assume is strongly convex and is Lipschitz. For PNPDRS and PNPADMM, we assume is
strongly convex. These are standard assumptions that are satisfied in application such as image denoising/deblurring and single photon imaging. Strong convexity, however, does exclude a few applications such as compressed sensing, sparse interpolation, and superresolution.
PNPFBS and PNPADMM are distinct methods for finding the same set of fixed points. Sometimes, PNPFBS is easier to implement since it only requires the computation of rather than . On the other hand, PNPADMM has better convergence properties as demonstrated theoretically by Theorems 1 and 2 and empirically by our experiments.
The proof of Theorem 2 relies on the notion of “negatively averaged” operators of (Giselsson, 2017). It is straightforward to modify Theorems 1 and 2 to establish local convergence when Assumption (A) holds locally. Theorem 2 can be generalized to the case when is strongly convex but nondifferentiable using the notion of subgradients.
Recently, (Fletcher et al., 2018) proved convergence of “plugandplay” vector approximate message passing, a method similar to ADMM, assuming Lipschitz continuity of the denoiser. Although the method, the proof technique, and the notion of convergence are different from ours, the similarities are noteworthy.
4 Real spectral normalization: enforcing Assumption (A)
We now present real spectral normalization, a technique for training denoisers to satisfy Assumption (A) and connect the practical implementations to the theory of Section 3.
4.1 Deep learning denoisers: SimpleCNN and DnCNN
We use a deep denoising model called DnCNN (Zhang et al., 2017a), which learns the residual mapping with a 17layer CNN and reports stateoftheart results on natural image denoising. Given a noisy observation , where is the clean image and is noise, the residual mapping outputs the noise, i.e., so that is the clean recovery. Learning the residual mapping is a popular approach in deep learningbased image restoration.
We also construct a simple convolutional encoderdecoder model for denoising and call it SimpleCNN. SimpleCNN consists of 4 convolutional layers, with ReLU and meansquareerror (MSE) loss and does not utilize any pooling or (batch) normalization.
We remark that realSN and the theory of this work is applicable to other deep denoisers. We use SimpleCNN to show that realSN is applicable to any CNN denoiser.
4.2 Lipschitz constrained deep denoising
Denote the denoiser (SimpleCNN or DnCNN) as , where is the noisy input and is the residual mapping, i.e., . Enforcing Assumption (A) is equivalent to constraining the Lipschitz constant of . We propose a variant of the spectral normalization (SN) (Miyato et al., 2018) for this.
Spectral normalization.
(Miyato et al., 2018) proposed to normalize the spectral norm of each layerwise weight (with ReLU nonlinearity) to one. Provided that we use Lipschitz nonlinearities (such as ReLU), the Lipschitz constant of a layer is upperbounded by the spectral norm of its weight, and the Lipschitz constant of the full network is bounded by the product of spectral norms of all layers (Gouk et al., 2018)
. To avoid the prohibitive cost of singular value decomposition (SVD) every iteration, SN approximately computes the largest singular values of weights using a small number of power iterations.
Given the weight matrix of the th layer, vectors
are initialized randomly and maintained in the memory to estimate the leading first left and right singular vector of
respectively. During each forward pass of the network, SN is applied to all layers following the twostep routine:
Apply one step of the power method to update :

Normalize with the estimated spectral norm:
While the basic methodology of SN suits our goal, the SN in (Miyato et al., 2018) uses a convenient but inexact implementation for convolutional layers. A convolutional layer is represented by a fourdimensional kernel of shape , where are kernel’s height and width. SN reshapes into a twodimensional matrix of shape and regards as the matrix above. This relaxation underestimates the true spectral norm of the convolutional operator (Corollary 1 of (Tsuzuku et al., 2018)) given by
where is the input to the convolutional layer and is the convolutional operator. This issue is not hypothetical. When we trained SimpleCNN with SN, the spectral norms of the layers were , , , and , i.e., SN failed to control the Lipschitz constant below .
Real spectral normalization.
We propose an improvement to SN for convolutional^{1}^{1}1
We use stride 1 and zeropad with width 1 for convolutions.
layers, called the real spectral normalization (realSN), to more accurately constrain the network’s Lipschitz constant and thereby enforce Assumption (A).In realSN, we directly consider the convolutional linear operator , where are input’s height and width, instead of reshaping the convolution kernel into a matrix. The power iteration also requies the conjugate (transpose) operator . It can be shown that is another convolutional operator with a kernel that is a rotated version of the forward convolutional kernel; the first two dimensions are permuted and the last two dimensions are rotated by 180 degrees (Liu et al., 2019). Instead of two vectors as in SN, realSN maintains and to estimate the leading left and right singular vectors respectively. During each forward pass of the neural network, realSN conducts:

Apply one step of the power method with operator :

Normalize the convolutional kernel with estimated spectral norm:
By replacing with , realSN can constrain the Lipschitz constant to any upper bound . Using the highly efficient convolution computation in modern deep learning frameworks, realSN can be implemented simply and efficiently. Specifically, realSN introduces three additional onesample convolution operations for each layer in each training step. When we used a batch size of , the extra computational cost of the additional operations is mild.
4.3 Implementation details
We refer to SimpleCNN and DnCNN regularized by realSN as RealSNSimpleCNN and RealSNDnCNN, respectively. We train them in the setting of Gaussian denoising with known fixed noise levels . We used for CSMRI and single photon imaging, and for Poisson denoising. The regularized denoisers are trained to have Lipschitz constant (no more than) 1. The training data consists of images from the BSD500 dataset, divided into patches. The CNN weights were initialized in the same way as (Zhang et al., 2017a)
. We train all networks using the ADAM optimizer for 50 epochs, with a minibatch size of 128. The learning rate was
in the first 25 epochs, then decreased to . On an Nvidia GTX 1080 Ti, DnCNN took 4.08 hours and realSNDnCNN took 5.17 hours to train, so the added cost of realSN is mild.5 Poisson denoising: validating the theory
Consider the Poisson denoising problem, where given a true image
, we observe independent Poisson random variables
, so , for . For details and motivation for this problem setup, see (Rond et al., 2016).For the objective function , we use the negative loglikelihood given by , where
We can compute elementwise with
The gradient of is given by for for . We set when , although, strictly speaking, is undefined when and . This does not seem to cause any problems in the experiments. Since we force the denoisers to output nonnegative pixel values, PNPFBS never needs to evaluate for negative .
For , we choose BM3D, SimpleCNN with and without realSN, and DnCNN with and without realSN. Note that these denoisers are designed or trained for the purpose of Gaussian denoising, and here we integrate them into the PnP frameworks for Poisson denoising. We scale the image so that the peak value of the image, the maximum mean of the Poisson random variables, is . The variable was initialized to the noisy image for PnPFBS and PnPADMM, and the variable was initialized to for PnPADMM. We use the test set of 13 images in (Chan et al., 2017).
Convergence.
We first examine which denoisers satisfy Assumption (A) with small . In Figure 1, we run PnP iterations of Poisson denoising on a single image (flag of (Rond et al., 2016)) with different models, calculate between the iterates and the limit, and plot the histogram. The maximum value of a histogram, marked by a vertical red bar, lowerbounds the of Assumption (A). Remember that Corollary 3 requires to ensure convergence of PnPADMM. Figure 1(a) proves that BM3D violates this assumption. Figures 1(b) and 1(c) and Figures 1(d) and 1(e) respectively illustrate that RealSN indeed improves (reduces) for SimpleCNN and DnCNN.
Figure 2 experimentally validates Theorems 1 and 2
, by examining the average (geometric mean) contraction factor (defined in Section
3) of PnPFBS and ADMM^{2}^{2}2We compute the contraction factor of the equivalent PnPDRS. iterations over a range of step sizes. Figure 2 qualitatively shows that PnPADMM exhibits more stable convergence than PnPFBS. Theorem 1 ensures PnPFBS is a contraction when is within an interval and Theorem 2 ensures PnPADMM is a contraction when is large enough. We roughly observe this behavior for the denoisers trained with RealSN.BM3D  RealSNDnCNN  RealSNSimpleCNN  

PNPADMM  23.4617  23.5873  18.7890 
PNPFBS  18.5835  22.2154  22.7280 
Empirical performance.
Our theory only concerns convergence and says nothing about the recovery performance of the output the methods converge to. We empirically verify that the PnP methods with RealSN, for which we analyzed convergence, yield competitive denoising results.
We fix for all denoisers in PNPADMM, and in PNPFBS. For deep learningbased denoisers, we choose . For BM3D, we choose as suggested in (Rond et al., 2016) and use .
Table 1 compares the PnP methods with BM3D, RealSNDnCNN, and RealSNSimpleCNN plugged in. In both PnP methods, one of the two denoisers using RealSN, for which we have theory, outperforms BM3D. It is interesting to obverse that the PnP performance does not necessarily hinge on the strength of the plugged in denoiser and that different PnP methods favor different denoisers. For example, RealSNSimpleCNN surpasses the much more sophisticated RealSNDnCNN under PnPFBS. However, RealSNDnCNN leads to better, and overall best, denoising performance when plugged into PnPADMM.
6 More applications
We now apply PnP on two imaging problemsand show that RealSN improves the reconstruction of PnP.^{3}^{3}3Code for our experiments in Sections 5 and 6 is available at https://github.com/uclaopt/Provable_Plug_and_Play/
Single photon imaging.
Consider single photon imaging with quanta image sensors (QIS) (Fossum, 2011; Chan & Lu, 2014; Elgendy & Chan, 2016) with the model
where is the underlying image, duplicates each pixel to pixels, is sensor gain, is the oversampling rate, is the observed binary photons. We want to recover from . The likelihood function is
where is the number of ones in the th unit pixel, is the number of zeros in the th unit pixel. The gradient of is given by and the proximal operator of is given in (Chan & Lu, 2014).
We compare PnPADMM and PnPFBS respectively with the denoisers BM3D, RealSNDnCNN, and RealSNSimpleCNN. We take . The variable was initialized to for PnPFBS and PnPADMM, and the variable was initialized to for PnPADMM. All deep denoisers used in this experiment were trained with fixed noise level . We report the PSNRs achieved at the 50th iteration, the 100th iteration, and the best PSNR values achieved within the first 100 iterations.
Table 2 reports the average PSNR results on the 13 images used in (Chan et al., 2017). Experiments indicate that PnPADMM methods constantly yields higher PNSR than the PnPFBS counterparts using the same denoiser. The best overall PSNR is achieved using PnPADMM with RealSNDnCNN, which shows nearly 1dB improvement over the result obtained with BM3D. We also observe that deep denoisers with RealSN make PnP converges more stably.
PnPADMM,  
Average PSNR  BM3D  RealSN  RealSN 
DnCNN  SimpleCNN  
Iteration 50  30.0034  31.0032  29.2154 
Iteration 100  30.0014  31.0032  29.2151 
Best Overall  30.0474  31.0431  29.2155 
PnPFBS,  
Average PSNR  BM3D  RealSN  RealSN 
DnCNN  SimpleCNN  
Iteration 50  28.7933  27.9617  29.0062 
Iteration 100  29.0510  27.9887  29.0517 
Best Overall  29.5327  28.4065  29.3563 
Sampling approach  Random  Radial  Cartesian  
Image  Brain  Bust  Brain  Bust  Brain  Bust  
Zerofilling  9.58  7.00  9.29  6.19  8.65  6.01  
TV (Lustig et al., 2005)  16.92  15.31  15.61  14.22  12.77  11.72  
RecRF (Yang et al., 2010)  16.98  15.37  16.04  14.65  12.78  11.75  
BM3DMRI (Eksioglu, 2016)  17.31  13.90  16.95  13.72  14.43  12.35  
PnPFBS  BM3D  19.09  16.36  18.10  15.67  14.37  12.99 
DnCNN  19.59  16.49  18.92  15.99  14.76  14.09  
RealSNDnCNN  19.82  16.60  18.96  16.09  14.82  14.25  
SimpleCNN  15.58  12.19  15.06  12.02  12.78  10.80  
RealSNSimpleCNN  17.65  14.98  16.52  14.26  13.02  11.49  
PnPADMM  BM3D  19.61  17.23  18.94  16.70  14.91  13.98 
DnCNN  19.86  17.05  19.00  16.64  14.86  14.14  
RealSNDnCNN  19.91  17.09  19.08  16.68  15.11  14.16  
SimpleCNN  16.68  12.56  16.83  13.47  13.03  11.17  
RealSNSimpleCNN  17.77  14.89  17.00  14.47  12.73  11.88 
Compressed sensing MRI.
Magnetic resonance imaging (MRI) is a widelyused imaging technique with a slow data acquisition. Compressed sensing MRI (CSMRI) accelerates MRI by acquiring less data through downsampling. PnP is useful in medical imaging as we do not have a large amount of data for endtoend training: we train the denoiser on natural images, and then “plug” it into the PnP framework to be applied to medical images. CSMRI is described mathematically as
where is the underlying image, is the linear measurement model, is the measured data, and is measurement noise. We want to recover from . The objective function is
The gradient of is given in (Liu et al., 2016) and the proximal operator of is given in (Eksioglu, 2016). We use BM3D, SimpleCNN and DnCNN, and their variants by RealSN for the PnP denoiser .
We take as the Fourier kdomain subsampling (partial Fourier operator). We tested random, radial, and Cartesian sampling (Eksioglu, 2016) with a sampling rate of . The noise level is taken as .
We compare PnP frameworks with zerofilling, totalvariation (TV) (Lustig et al., 2005), RecRF (Yang et al., 2010), and BM3DMRI (Eksioglu, 2016) ^{4}^{4}4Some recent deeplearning based methods (Yang et al., 2016; Kulkarni et al., 2016; Metzler et al., 2017; Zhang & Ghanem, 2018) are not compared here because we assume we do not have enough medical images for training.. The parameters are taken as follows. For TV, the regularization parameter is taken as the best one from . For RecRF, the two parameters are both taken from the above sets and the best results are reported. For BM3DMRI, we set the “final noise level (the noise level in the last iteration)” as , which is suggested in their MATLAB library. For PnP methods with as BM3D, we set , take and report the best results. For PNPADMM with as deep denoisers, we take and uniformly for all the cases. For PNPFBS with as deep denoisers, we take and uniformly. All deep denoisers are trained on BSD500 (Martin et al., 2001), a natural image data set; no medical image is used in training. The variable was initialized to the zerofilled solution for PnPFBS and PnPADMM, and the variable was initialized to for PnPADMM. Table 3 reports our results on CSMRI, from which we can confirm the effectiveness of PnP frameworks. Moreover, using RealSNDnCNN seems to the clear winner over all. We also observe that PnPADMM generally outperforms PnPFBS when using the same denoiser, which supports Theorems 1 and 2.
7 Conclusion
In this work, we analyzed the convergence of PnPFBS and PnPADMM under a Lipschitz assumption on the denoiser. We then presented real spectral normalization a technique to enforce the proposed Lipschitz condition in training deep learningbased denoisers. Finally, we validate the theory with experiments.
Acknowledgements
We thank Pontus Giselsson for the discussion on negatively averaged operators and Stanley Chan for the discussion on the difficulties in establishing convergence of PnP methods. This work was partially supported by National Key R&D Program of China 2017YFB02029, AFOSR MURI FA95501810502, NSF DMS1720237, ONR N0001417121, and NSF RI1755701.
References
 Bailion et al. (1978) Bailion, J. B., Bruck, R. E., and Reich, S. On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces. Houston Journal of Mathematics, 4(1), 1978.
 Bansal et al. (2018) Bansal, N., Chen, X., and Wang, Z. Can we gain more from orthogonality regularizations in training deep networks? In Advances in Neural Information Processing Systems, pp. 4266–4276, 2018.
 Bartlett et al. (2017) Bartlett, P. L., Foster, D. J., and Telgarsky, M. J. Spectrallynormalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pp. 6240–6249, 2017.
 Bauschke & Combettes (2017) Bauschke, H. H. and Combettes, P. L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer New York, 2nd edition, 2017.
 Brifman et al. (2016) Brifman, A., Romano, Y., and Elad, M. Turning a denoiser into a superresolver using plug and play priors. 2016 IEEE International Conference on Image Processing, pp. 1404–1408, 2016.
 Brock et al. (2019) Brock, A., Donahue, J., and Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.

Buades et al. (2005)
Buades, A., Coll, B., and Morel, J.M.
A nonlocal algorithm for image denoising.
In
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, 2005.  Buzzard et al. (2018) Buzzard, G. T., Chan, S. H., Sreehari, S., and Bouman, C. A. Plugandplay unplugged: Optimizationfree reconstruction using consensus equilibrium. SIAM Journal on Imaging Sciences, 11(3):2001–2020, 2018.
 Chan (2019) Chan, S. H. Performance analysis of PlugandPlay ADMM: A graph signal processing perspective. IEEE Transactions on Computational Imaging, 2019.
 Chan & Lu (2014) Chan, S. H. and Lu, Y. M. Efficient image reconstruction for gigapixel quantum image sensors. In Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on, pp. 312–316. IEEE, 2014.
 Chan et al. (2017) Chan, S. H., Wang, X., and Elgendy, O. A. Plugandplay ADMM for image restoration: Fixedpoint convergence and applications. IEEE Transactions on Computational Imaging, 3(1):84–98, 2017.
 Chang et al. (2017) Chang, J. R., Li, C.L., Poczos, B., and Kumar, B. V. One network to solve them all—solving linear inverse problems using deep projection models. In 2017 IEEE International Conference on Computer Vision, pp. 5889–5898. IEEE, 2017.
 Chen et al. (2018) Chen, X., Liu, J., Wang, Z., and Yin, W. Theoretical linear convergence of unfolded ista and its practical weights and thresholds. In Advances in Neural Information Processing Systems, pp. 9061–9071, 2018.
 Combettes & Yamada (2015) Combettes, P. L. and Yamada, I. Compositions and convex combinations of averaged nonexpansive operators. Journal of Mathematical Analysis and Applications, 425(1):55–70, 2015.
 Dabov et al. (2007) Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. Image denoising by sparse 3D transformdomain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095, 2007.
 Danielyan et al. (2010) Danielyan, A., Katkovnik, V., and Egiazarian, K. Image deblurring by augmented Lagrangian with BM3D frame prior. In Workshop on Information Theoretic Methods in Science and Engineering, pp. 16–18, 2010.
 Danielyan et al. (2012) Danielyan, A., Katkovnik, V., and Egiazarian, K. BM3D frames and variational image deblurring. IEEE Transactions on Image Processing, 21(4):1715–1728, 2012.
 Dar et al. (2016) Dar, Y., Bruckstein, A. M., Elad, M., and Giryes, R. Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing, 25(7):3044–3058, 2016.
 Dong et al. (2018) Dong, W., Wang, P., Yin, W., and Shi, G. Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
 Egiazarian & Katkovnik (2015) Egiazarian, K. and Katkovnik, V. Single image superresolution via BM3D sparse coding. In 2015 23rd European Signal Processing Conference, pp. 2849–2853, 2015.
 Eksioglu (2016) Eksioglu, E. M. Decoupled algorithm for mri reconstruction using nonlocal block matching model: BM3DMRI. Journal of Mathematical Imaging and Vision, 56(3):430–440, 2016.
 Elgendy & Chan (2016) Elgendy, O. A. and Chan, S. H. Image reconstruction and threshold design for quanta image sensors. In 2016 IEEE International Conference on Image Processing, pp. 978–982. IEEE, 2016.
 Fan et al. (2019) Fan, K., Wei, Q., Wang, W., Chakraborty, A., and Heller, K. InverseNet: Solving inverse problems with splitting networks. IEEE International Conference on Multimedia and Expo, 2019.
 Fletcher et al. (2018) Fletcher, A. K., Pandit, P., Rangan, S., Sarkar, S., and Schniter, P. Plugin estimation in highdimensional linear inverse problems: A rigorous analysis. In Advances in Neural Information Processing Systems 31, pp. 7451–7460. 2018.
 Fossum (2011) Fossum, E. The quanta image sensor (QIS): concepts and challenges. In Imaging Systems and Applications. Optical Society of America, 2011.
 Giselsson (2017) Giselsson, P. Tight global linear convergence rate bounds for Douglas–Rachford splitting. Journal of Fixed Point Theory and Applications, 19(4):2241–2270, 2017.
 Gouk et al. (2018) Gouk, H., Frank, E., Pfahringer, B., and Cree, M. Regularisation of neural networks by enforcing Lipschitz continuity. arXiv preprint arXiv:1804.04368, 2018.
 He et al. (2018) He, J., Yang, Y., Wang, Y., Zeng, D., Bian, Z., Zhang, H., Sun, J., Xu, Z., and Ma, J. Optimizing a parameterized plugandplay ADMM for iterative lowdose CT reconstruction. IEEE Transactions on Medical Imaging, pp. 1–13, 2018.
 He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
 Heide et al. (2014) Heide, F., Steinberger, M., Tsai, Y.T., Rouf, M., Pajak, D., Reddy, D., Gallo, O., Liu, J., Heidrich, W., Egiazarian, K., Kautz, J., and Pulli, K. FlexISP: A flexible camera image processing framework. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia 2014), 33(6), 2014.

Ioffe & Szegedy (2015)
Ioffe, S. and Szegedy, C.
Batch normalization: Accelerating deep network training by reducing
internal covariate shift.
In
Proceedings of the 32nd International Conference on Machine Learning
, volume 37, pp. 448–456, 2015.  Kamilov et al. (2017) Kamilov, U. S., Mansour, H., and Wohlberg, B. A plugandplay priors approach for solving nonlinear imaging inverse problems. IEEE Signal Processing Letters, 24(12):1872–1876, 2017.
 Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
 Kulkarni et al. (2016) Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., and Ashok, A. Reconnet: Noniterative reconstruction of images from compressively sensed measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 449–458, 2016.
 Liu et al. (2019) Liu, J., Chen, X., Wang, Z., and Yin, W. ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA. ICLR, 2019.
 Liu et al. (2018) Liu, P., Zhang, H., Zhang, K., Lin, L., and Zuo, W. Multilevel WaveletCNN for image restoration. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 886–895, 2018.
 Liu et al. (2016) Liu, Y., Zhan, Z., Cai, J.F., Guo, D., Chen, Z., and Qu, X. Projected iterative softthresholding algorithm for tight frames in compressed sensing magnetic resonance imaging. IEEE transactions on medical imaging, 35(9):2130–2140, 2016.
 Lustig et al. (2005) Lustig, M., Santos, J. M., Lee, J.H., Donoho, D. L., and Pauly, J. M. Application of compressed sensing for rapid MR imaging. SPARS,(Rennes, France), 2005.
 Lyu et al. (2019) Lyu, Q., Ruan, D., Hoffman, J., Neph, R., McNittGray, M., and Sheng, K. Iterative reconstruction for low dose ct using PlugandPlay alternating direction method of multipliers (ADMM) framework. Proceedings of SPIE, 10949, 2019.
 Mao et al. (2016) Mao, X., Shen, C., and Yang, Y.B. Image restoration using very deep convolutional encoderdecoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems, pp. 2802–2810, 2016.
 Martin et al. (2001) Martin, D., Fowlkes, C., Tal, D., and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th International Conference on Computer Vision, volume 2, pp. 416–423, July 2001.
 Meinhardt et al. (2017) Meinhardt, T., Moeller, M., Hazirbas, C., and Cremers, D. Learning proximal operators: Using denoising networks for regularizing inverse imaging problems. In 2017 International Conference on Computer Vision, pp. 1799–1808, 2017.
 Metzler et al. (2017) Metzler, C., Mousavi, A., and Baraniuk, R. Learned damp: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems, pp. 1772–1783, 2017.
 Metzler et al. (2016) Metzler, C. A., Maleki, A., and Baraniuk, R. G. From denoising to compressed sensing. IEEE Transactions on Information Theory, 62(9):5117–5144, 2016.
 Miyato et al. (2018) Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
 Moreau (1965) Moreau, J. J. Proximité et dualité dans un espace Hilbertien. Bulletin de la Société Mathématique de France, 93:273–299, 1965.
 Oberman & Calder (2018) Oberman, A. M. and Calder, J. Lipschitz regularized deep neural networks converge and generalize. arXiv preprint arXiv:1808.09540, 2018.
 Ogura & Yamada (2002) Ogura, N. and Yamada, I. Nonstrictly convex minimization over the fixed point set of an asymptotically shrinking nonexpansive mapping. Numerical Functional Analysis and Optimization, 23(12):113–137, 2002.
 Ono (2017) Ono, S. Primaldual plugandplay image restoration. IEEE Signal Processing Letters, 24(8):1108–1112, 2017.
 Plötz & Roth (2018) Plötz, T. and Roth, S. Neural nearest neighbors networks. In Advances in Neural Information Processing Systems, pp. 1095–1106, 2018.
 Polyak (1987) Polyak, B. T. Introduction to Optimization. Optimization Software Inc., New York, 1987.
 Qian & Wegman (2019) Qian, H. and Wegman, M. N. L2nonexpansive neural networks. In International Conference on Learning Representations, 2019.
 Reehorst & Schniter (2019) Reehorst, E. T. and Schniter, P. Regularization by denoising: Clarifications and new interpretations. IEEE Transactions on Computational Imaging, 5(1):52–67, 2019.
 Romano et al. (2017) Romano, Y., Elad, M., and Milanfar, P. The little engine that could: Regularization by denoising (RED). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017.
 Rond et al. (2016) Rond, A., Giryes, R., and Elad, M. Poisson inverse problems by the plugandplay scheme. Journal of Visual Communication and Image Representation, 41:96–108, 2016.
 Rudin et al. (1992) Rudin, L. I., Osher, S., and Fatemi, E. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(14):259–268, 1992.
 Ryu & Boyd (2016) Ryu, E. K. and Boyd, S. Primer on monotone operator methods. Applied and Computational Mathematics, 15:3–43, 2016.
 Shi & Feng (2018) Shi, M. and Feng, L. Plugandplay prior based on Gaussian mixture model learning for image restoration in sensor network. IEEE Access, 6:78113–78122, 2018.
 Sreehari et al. (2016) Sreehari, S., Venkatakrishnan, S. V., Wohlberg, B., Buzzard, G. T., Drummy, L. F., Simmons, J. P., and Bouman, C. A. Plugandplay priors for bright field electron tomography and sparse interpolation. IEEE Transactions on Computational Imaging, 2(4):408–423, 2016.
 Sreehari et al. (2017) Sreehari, S., Venkatakrishnan, S. V., Bouman, K. L., Simmons, J. P., Drummy, L. F., and Bouman, C. A. Multiresolution data fusion for superresolution electron microscopy. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017.
 Sun et al. (2018a) Sun, Y., Wohlberg, B., and Kamilov, U. S. Plugin stochastic gradient method. arXiv preprint arXiv:1811.03659, 2018a.
 Sun et al. (2018b) Sun, Y., Xu, S., Li, Y., Tian, L., Wohlberg, B., and Kamilov, U. S. Regularized Fourier ptychography using an online plugandplay algorithm. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018b.
 Sun et al. (2019) Sun, Y., Wohlberg, B., and Kamilov, U. S. An online plugandplay algorithm for regularized image reconstruction. IEEE Transactions on Computational Imaging, 2019.
 Taylor et al. (2018) Taylor, A. B., Hendrickx, J. M., and Glineur, F. Exact worstcase convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 2018.
 Teodoro et al. (2016) Teodoro, A. M., BioucasDias, J. M., and Figueiredo, M. A. T. Image restoration and reconstruction using variable splitting and classadapted image priors. IEEE International Conference on Image Processing, 2016.
 Teodoro et al. (2017) Teodoro, A. M., BioucasDias, J. M., and Figueiredo, M. A. T. Sceneadapted plugandplay algorithm with convergence guarantees. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing, pp. 1–6, 2017.
 Teodoro et al. (2019) Teodoro, A. M., BioucasDias, J. M., and Figueiredo, M. A. T. A convergent image fusion algorithm using sceneadapted Gaussianmixturebased denoising. IEEE Transactions on Image Processing, 28(1):451–463, 2019.
 Tirer & Giryes (2019) Tirer, T. and Giryes, R. Image restoration by iterative denoising and backward projections. IEEE Transactions on Image Processing, 28(3):1220–1234, 2019.
 Tsuzuku et al. (2018) Tsuzuku, Y., Sato, I., and Sugiyama, M. Lipschitzmargin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pp. 6541–6550, 2018.
 Venkatakrishnan et al. (2013) Venkatakrishnan, S. V., Bouman, C. A., and Wohlberg, B. Plugandplay priors for model based reconstruction. 2013 IEEE Global Conference on Signal and Information Processing, pp. 945–948, 2013.
 Wang & Chan (2017) Wang, X. and Chan, S. H. Parameterfree PlugandPlay ADMM for image restoration. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1323–1327. 2017.
 Weng et al. (2018) Weng, T.W., Zhang, H., Chen, P.Y., Yi, J., Su, D., Gao, Y., Hsieh, C.J., and Daniel, L. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations, 2018.
 Yang et al. (2010) Yang, J., Zhang, Y., and Yin, W. A fast alternating direction method for TVL1L2 signal reconstruction from partial Fourier data. IEEE Journal of Selected Topics in Signal Processing, 4(2):288–297, 2010.
 Yang et al. (2016) Yang, Y., Sun, J., Li, H., and Xu, Z. Deep ADMMNet for compressive sensing MRI. In Advances in Neural Information Processing Systems, pp. 10–18, 2016.
 Ye et al. (2018) Ye, D. H., Srivastava, S., Thibault, J., Sauer, K., and Bouman, C. Deep residual learning for modelbased iterative CT reconstruction using plugandplay framework. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6668–6672, 2018.
 Zhang & Ghanem (2018) Zhang, J. and Ghanem, B. ISTANet: Interpretable optimizationinspired deep network for image compressive sensing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 Zhang et al. (2017a) Zhang, K., Zuo, W., Chen, Y., Meng, D., and Zhang, L. Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017a.
 Zhang et al. (2017b) Zhang, K., Zuo, W., Gu, S., and Zhang, L. Learning deep CNN denoiser prior for image restoration. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2808–2817, 2017b.
 Zhang et al. (2018) Zhang, K., Zuo, W., and Zhang, L. FFDNet: Toward a fast and flexible solution for cnn based image denoising. IEEE Transactions on Image Processing, 27(9):4608–4622, 2018.
 Zoran & Weiss (2011) Zoran, D. and Weiss, Y. From learning models of natural image patches to whole image restoration. In 2011 International Conference on Computer Vision, pp. 479–486, 2011.
8 Preliminaries
For any , write for the inner product. We say a function is convex if
for any and . A convex function is closed if it is lower semicontinuous and proper if it is finite somwhere. We say is strongly convex for if is a convex function. Given a convex function and , define its proximal operator as
When is convex, closed, and proper, the uniquely exists, and therefore is welldefined. An mapping is Lipschitz if
for all . If is Lipschitz with , we say is nonexpansive. If is Lipschitz with , we say is a contraction. A mapping is averaged for , if it is nonexpansive and if
where is another nonexpansive mapping.
Lemma 4 (Proposition 4.35 of (Bauschke & Combettes, 2017)).
is averaged if and only if
for all .
Lemma 5 ((Ogura & Yamada, 2002; Combettes & Yamada, 2015)).
Assume and are and averaged, respectively. Then is averaged.
Lemma 6.
Let . is averaged if and only if is averaged.
Proof.
The lemma follows from the fact that
for some nonexpansive and that nonexpansiveness of and implies nonexpansivenes of . ∎
Lemma 7 ((Taylor et al., 2018)).
Assume is strongly convex and is Lipschitz. Then for any , we have
Lemma 8 (Proposition 5.4 of (Giselsson, 2017)).
Assume is strongly convex, closed, and proper. Then
is averaged.
References.
The notion of proximal operator and its welldefinedness were first presented in (Moreau, 1965). The notion of averaged mappings were first introduced in (Bailion et al., 1978). The idea of Lemma 6 relates to “negatively averaged” operators from (Giselsson, 2017). Lemma 7 is proved in a weaker form as Theorem 3 of (Polyak, 1987) and in Section 5.1 of (Ryu & Boyd, 2016). Lemma 7 as stated is proved as Theorem 2.1 in (Taylor et al., 2018).
9 Proofs of main results
9.1 Equivalence of PNPDRS and PNPADMM
We show the standard steps that establish equivalence of PNPDRS and PNPADMM. Starting from PNPDRS, we substitute to get
We reorder the iterations to get the correct dependency
We label and
and we get PNPADMM.