Plug-and-Play Methods Provably Converge with Properly Trained Denoisers

by   Ernest K. Ryu, et al.

Plug-and-play (PnP) is a non-convex framework that integrates modern denoising priors, such as BM3D or deep learning-based denoisers, into ADMM or other proximal algorithms. An advantage of PnP is that one can use pre-trained denoisers when there is not sufficient data for end-to-end training. Although PnP has been recently studied extensively with great empirical success, theoretical analysis addressing even the most basic question of convergence has been insufficient. In this paper, we theoretically establish convergence of PnP-FBS and PnP-ADMM, without using diminishing stepsizes, under a certain Lipschitz condition on the denoisers. We then propose real spectral normalization, a technique for training deep learning-based denoisers to satisfy the proposed Lipschitz condition. Finally, we present experimental results validating the theory.


Instabilities in Plug-and-Play (PnP) algorithms from a learned denoiser

It's well-known that inverse problems are ill-posed and to solve them me...

Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization

We consider the stochastic composition optimization problem proposed in ...

Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems

Plug-and-play (PnP) is a non-convex framework that combines ADMM or othe...

Scalable Plug-and-Play ADMM with Convergence Guarantees

Plug-and-play priors (PnP) is a broadly applicable methodology for solvi...

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Convergence and convergence rate analyses of adaptive methods, such as A...

Interpretation of Plug-and-Play (PnP) algorithms from a different angle

It's well-known that inverse problems are ill-posed and to solve them me...

Convolutional Proximal Neural Networks and Plug-and-Play Algorithms

In this paper, we introduce convolutional proximal neural networks (cPNN...

1 Introduction

Many modern image processing algorithms recover or denoise an image through the optimization problem

where the optimization variable represents the image, measures data fidelity, measures noisiness or complexity of the image, and is a parameter representing the relative importance between and . Total variation denoising, inpainting, and compressed sensing fall under this setup. A priori knowledge of the image, such as that the image should have small noise, is encoded in . So is small if has small noise or complexity. A posteriori knowledge of the image, such as noisy or partial measurements of the image, is encoded in . So is small if agrees with the measurements.

First-order iterative methods are often used to solve such optimization problems, and ADMM is one such method:

with . Given a function on and , define the proximal operator of as

which is well-defined if is proper, closed, and convex. Now we can equivalently write ADMM as

We can interpret the subroutine as a denoiser, i.e.,

(For example, if is the noise level and is the total variation (TV) norm, then is the standard Rudin–Osher–Fatemi (ROF) model (Rudin et al., 1992).) We can think of as a mapping enforcing consistency with measured data, i.e.,

More precisely speaking, for any we have

However, some state-of-the-art image denoisers with great empirical performance do not originate from optimization problems. Such examples include non-local means (NLM) (Buades et al., 2005), Block-matching and 3D filtering (BM3D) (Dabov et al., 2007)

, and convolutional neural networks (CNN)

(Zhang et al., 2017a). Nevertheless, such a denoiser still has the interpretation

where is a noise parameter. Larger values of correspond to more aggressive denoising.

Is it possible to use such denoisers for a broader range of imaging problems, even though we cannot directly set up an optimization problem? To address this question, (Venkatakrishnan et al., 2013) proposed Plug-and-Play ADMM (PnP-ADMM), which simply replaces the proximal operator with the denoiser :

Surprisingly and remarkably, this ad-hoc method exhibited great empirical success, and spurred much follow-up work.

Contribution of this paper.

The empirical success of Plug-and-Play (PnP) naturally leads us to ask theoretical questions: When does PnP converge and what denoisers can we use? Past theoretical analysis has been insufficient.

The main contribution of this work is the convergence analyses of PnP methods. We study two Plug-and-play methods, Plug-and-play forward-backward splitting (PNP-FBS) and PNP-ADMM. For the analysis, we assume the denoiser satisfies a certain Lipschitz condition, formally defined as Assumption (A). Roughly speaking, the condition corresponds to the denoiser being close to the identity map, which is reasonable when the denoising parameter is small. In particular, we do not assume that is nonexpansive or differentiable since most denoisers do not have such properties. Under the assumption, we show that the PnP methods are contractive iterations.

We then propose real spectral normalization (realSN), a technique based on (Miyato et al., 2018) for more accurately constraining deep learning-based denoisers in their training to satisfy the proposed Lipschitz condition. Finally, we present experimental results validating our theory. Code used for experiments is available at:

1.1 Prior work

Plug-and-play: Practice.

The first PnP method was the Plug-and-play ADMM proposed in (Venkatakrishnan et al., 2013). Since then, other schemes such as the primal-dual method (Heide et al., 2014; Meinhardt et al., 2017; Ono, 2017), ADMM with increasing penalty parameter (Brifman et al., 2016; Chan et al., 2017), generalized approximate message passing (Metzler et al., 2016), Newton iteration (Buzzard et al., 2018), Fast Iterative Shrinkage-Thresholding Algorithm (Kamilov et al., 2017; Sun et al., 2018b), (stochastic) forward-backward splitting (Sun et al., 2019, 2018a, 2018b), and alternating minimization (Dong et al., 2018) have been combined with the PnP technique.

PnP method reported empirical success on a large variety of imaging applications: bright field electron tomography (Sreehari et al., 2016), camera image processing (Heide et al., 2014), compression-artifact reduction (Dar et al., 2016), compressive imaging (Teodoro et al., 2016), deblurring (Teodoro et al., 2016; Rond et al., 2016; Wang & Chan, 2017), electron microscopy (Sreehari et al., 2017), Gaussian denoising (Buzzard et al., 2018; Dong et al., 2018), nonlinear inverse scattering (Kamilov et al., 2017), Poisson denoising (Rond et al., 2016), single-photon imaging (Chan et al., 2017)

, super-resolution

(Brifman et al., 2016; Sreehari et al., 2016; Chan et al., 2017), diffraction tomography (Sun et al., 2019), Fourier ptychographic microscopy (Sun et al., 2018b), low-dose CT imaging (Venkatakrishnan et al., 2013; He et al., 2018; Ye et al., 2018; Lyu et al., 2019), hyperspectral sharpening (Teodoro et al., 2017, 2019), inpainting (Chan, 2019; Tirer & Giryes, 2019), and superresolution (Dong et al., 2018).

A wide range of denoisers have been used for the PnP framework. BM3D has been used the most (Heide et al., 2014; Dar et al., 2016; Rond et al., 2016; Sreehari et al., 2016; Chan et al., 2017; Kamilov et al., 2017; Ono, 2017; Wang & Chan, 2017), but other denoisers such as sparse representation (Brifman et al., 2016), non-local means (Venkatakrishnan et al., 2013; Heide et al., 2014; Sreehari et al., 2016, 2017; Chan, 2019)

, Gaussian mixture model

(Teodoro et al., 2016, 2017; Shi & Feng, 2018; Teodoro et al., 2019), Patch-based Wiener filtering (Venkatakrishnan et al., 2013), nuclear norm minimization (Kamilov et al., 2017), deep learning-based denoisers (Meinhardt et al., 2017; He et al., 2018; Ye et al., 2018; Tirer & Giryes, 2019) and deep projection model based on generative adversarial networks (Chang et al., 2017) have also been considered.

Plug-and-play: Theory.

Compared to the empirical success, much less progress was made on the theoretical aspects of PnP optimization. (Chan et al., 2017) analyzed convergence with a bounded denoiser assumption, establishing convergence using an increasing penalty parameter. (Buzzard et al., 2018) provided an interpretation of fixed points via “consensus equilibrium”. (Sreehari et al., 2016; Sun et al., 2019; Teodoro et al., 2017; Chan, 2019; Teodoro et al., 2019) proved convergence of PNP-ADMM and PNP-FBS with the assumption that the denoiser is (averaged) nonexpansive by viewing the methods to be fixed-point iterations. The nonexpansiveness assumption is not met with most denoisers as is, but (Chan, 2019) proposed modifications to the non-local means and Gaussian mixture model denoisers, which make them into linear filters, to enforce nonexpansiveness. (Dong et al., 2018) presented a proof that relies on the existence of a certain Lyapunov function that is monotonic under , which holds only for simple . (Tirer & Giryes, 2019) analyzed a variant of PnP, but did not establish local convergence since their key assumption is only expected to be satisfied “in early iterations”.

Other PnP-type methods.

There are other lines of works that incorporate modern denoisers into model-based optimization methods. The plug-in idea with half quadratic splitting, as opposed to ADMM, was discussed (Zoran & Weiss, 2011) and this approach was carried out with deep learning-based denoisers in (Zhang et al., 2017b). (Danielyan et al., 2012; Egiazarian & Katkovnik, 2015) use the notion of Nash equilibrium to propose a scheme similar to PnP. (Danielyan et al., 2010) proposed an augmented Lagrangian method similar to PnP. (Romano et al., 2017; Reehorst & Schniter, 2019) presented Regularization by Denoising (RED), which uses the (nonconvex) regularizer given a denoiser , and use denoiser evaluations in its iterations. (Fletcher et al., 2018)

applies the plug-in approach to vector approximate message passing.

(Yang et al., 2016; Fan et al., 2019) replaced both the proximal operator enforcing data fidelity and the denoiser with two neural networks and performed end-to-end training. Broadly, there are more works that incorporate model-based optimization with deep learning (Chen et al., 2018; Liu et al., 2019).

Image denoising using deep learning.

Deep learning-based denoising methods have become state-of-the-art. (Zhang et al., 2017a)

proposed an effective denoising network called DnCNN, which adopted batch normalization

(Ioffe & Szegedy, 2015)

and ReLU

(Krizhevsky et al., 2012) into the residual learning (He et al., 2016). Other represenative deep denoising models include the deep convolutional encoder-decoder with symmetric skip connection (Mao et al., 2016), Net (Plötz & Roth, 2018), and MWCNN (Liu et al., 2018). The recent FFDNet (Zhang et al., 2018) handles spatially varying Gaussian noise.

Regularizing Lipschitz continuity.

Lipschitz continuity and its variants have started to receive attention as a means for regularizing deep classifiers

(Bartlett et al., 2017; Bansal et al., 2018; Oberman & Calder, 2018) and GANs (Miyato et al., 2018; Brock et al., 2019). Regularizing Lipschitz continuity stabilizes training, improves the final performance, and enhances robustness to adversarial attacks (Weng et al., 2018; Qian & Wegman, 2019). Specifically, (Miyato et al., 2018) proposed to normalize all weights to be of unit spectral norms to thereby constrain the Lipschitz constant of the overall network to be no more than one.

2 PNP-FBS/ADMM and their fixed points

We now present the PnP methods we investigate in this work. We quickly note that although PNP-FBS and PNP-ADMM are distinct methods, they share the same fixed points by Remark 3.1 of (Meinhardt et al., 2017) and Proposition 3 of (Sun et al., 2019).

We call the method


for any , plug-and-play forward-backward splitting (PNP-FBS) or plug-and-play proximal gradient method.

We interpret PNP-FBS as a fixed-point iteration, and we say is a fixed point of PNP-FBS if

Fixed points of PNP-FBS have a simple, albeit non-rigorous, interpretation. An image denoising algorithm must trade off the two goals of making the image agree with measurements and making the image less noisy. PNP-FBS applies and , each promoting such objectives, repeatedly in an alternating fashion. If PNP-FBS converges to a fixed point, we can expect the limit to represent a compromise.

We call the method


for any , plug-and-play alternating directions method of multipliers (PNP-ADMM). We interpret PNP-ADMM as a fixed-point iteration, and we say is a fixed point of PNP-ADMM if

If we let and in (PNP-ADMM), then we get and . We call the method


plug-and-play Douglas–Rachford splitting (PNP-DRS). We interpret PNP-DRS as a fixed-point iteration, and we say is a fixed point of PNP-DRS if

PNP-ADMM and PNP-DRS are equivalent. Although this is not surprising as the equivalence between convex ADMM and DRS is well known, we show the steps establishing equivalence in the supplementary document.

We introduce PNP-DRS as an analytical tool for analyzing PNP-ADMM. It is straightforward to verify that PNP-DRS can be written as , where

We use this form to analyze the convergence of PNP-DRS and translate the result to PNP-ADMM.

3 Convergence via contraction

We now present conditions that ensure the PnP methods are contractive and thereby convergent.

If we assume is nonexpansive, standard tools of monotone operator theory tell us that PnP-ADMM converges. However, this assumption is too strong. Chan et al. presented a counter example demonstrating that is not nonexpansive for the NLM denoiser (Chan et al., 2017).

Rather, we assume satisfies


for all for some . Since controls the strength of the denoising, we can expect to be close to identity for small . If so , Assumption (A) is reasonable.

Under this assumption, we show that the PNP-FBS and PNP-DRS iterations are contractive in the sense that we can express the iterations as , where satisfies

for all for some . We call the contraction factor. If satisfies , i.e., is a fixed point, then geometrically by the classical Banach contraction principle.

Theorem 1 (Convergence of PNP-FBS).

Assume satisfies assumption (A) for some . Assume is -strongly convex, is differentiable, and is -Lipschitz. Then


for all . The coefficient is less than if

Such an exists if .

Theorem 2 (Convergence of PNP-DRS).

Assume satisfies assumption (A) for some . Assume is -strongly convex and differentiable. Then


for all . The coefficient is less than if

Corollary 3 (Convergence of PNP-ADMM).

Assume satisfies assumption (A) for some . Assume is -strongly convex. Then PNP-ADMM converges for


This follows from Theorem 2 and the equivalence of PNP-DRS and PNP-ADMM. ∎

For PNP-FBS, we assume is -strongly convex and is -Lipschitz. For PNP-DRS and PNP-ADMM, we assume is

-strongly convex. These are standard assumptions that are satisfied in application such as image denoising/deblurring and single photon imaging. Strong convexity, however, does exclude a few applications such as compressed sensing, sparse interpolation, and super-resolution.

PNP-FBS and PNP-ADMM are distinct methods for finding the same set of fixed points. Sometimes, PNP-FBS is easier to implement since it only requires the computation of rather than . On the other hand, PNP-ADMM has better convergence properties as demonstrated theoretically by Theorems 1 and 2 and empirically by our experiments.

The proof of Theorem 2 relies on the notion of “negatively averaged” operators of (Giselsson, 2017). It is straightforward to modify Theorems 1 and 2 to establish local convergence when Assumption (A) holds locally. Theorem 2 can be generalized to the case when is strongly convex but non-differentiable using the notion of subgradients.

Recently, (Fletcher et al., 2018) proved convergence of “plug-and-play” vector approximate message passing, a method similar to ADMM, assuming Lipschitz continuity of the denoiser. Although the method, the proof technique, and the notion of convergence are different from ours, the similarities are noteworthy.

4 Real spectral normalization: enforcing Assumption (A)

We now present real spectral normalization, a technique for training denoisers to satisfy Assumption (A) and connect the practical implementations to the theory of Section 3.

4.1 Deep learning denoisers: SimpleCNN and DnCNN

We use a deep denoising model called DnCNN (Zhang et al., 2017a), which learns the residual mapping with a 17-layer CNN and reports state-of-the-art results on natural image denoising. Given a noisy observation , where is the clean image and is noise, the residual mapping outputs the noise, i.e., so that is the clean recovery. Learning the residual mapping is a popular approach in deep learning-based image restoration.

We also construct a simple convolutional encoder-decoder model for denoising and call it SimpleCNN. SimpleCNN consists of 4 convolutional layers, with ReLU and mean-square-error (MSE) loss and does not utilize any pooling or (batch) normalization.

We remark that realSN and the theory of this work is applicable to other deep denoisers. We use SimpleCNN to show that realSN is applicable to any CNN denoiser.

4.2 Lipschitz constrained deep denoising

Denote the denoiser (SimpleCNN or DnCNN) as , where is the noisy input and is the residual mapping, i.e., . Enforcing Assumption (A) is equivalent to constraining the Lipschitz constant of . We propose a variant of the spectral normalization (SN) (Miyato et al., 2018) for this.

Spectral normalization.

(Miyato et al., 2018) proposed to normalize the spectral norm of each layer-wise weight (with ReLU non-linearity) to one. Provided that we use -Lipschitz nonlinearities (such as ReLU), the Lipschitz constant of a layer is upper-bounded by the spectral norm of its weight, and the Lipschitz constant of the full network is bounded by the product of spectral norms of all layers (Gouk et al., 2018)

. To avoid the prohibitive cost of singular value decomposition (SVD) every iteration, SN approximately computes the largest singular values of weights using a small number of power iterations.

Given the weight matrix of the -th layer, vectors

are initialized randomly and maintained in the memory to estimate the leading first left and right singular vector of

respectively. During each forward pass of the network, SN is applied to all layers following the two-step routine:

  1. Apply one step of the power method to update :

  2. Normalize with the estimated spectral norm:

While the basic methodology of SN suits our goal, the SN in (Miyato et al., 2018) uses a convenient but inexact implementation for convolutional layers. A convolutional layer is represented by a four-dimensional kernel of shape , where are kernel’s height and width. SN reshapes into a two-dimensional matrix of shape and regards as the matrix above. This relaxation underestimates the true spectral norm of the convolutional operator (Corollary 1 of (Tsuzuku et al., 2018)) given by

where is the input to the convolutional layer and is the convolutional operator. This issue is not hypothetical. When we trained SimpleCNN with SN, the spectral norms of the layers were , , , and , i.e., SN failed to control the Lipschitz constant below .

Real spectral normalization.

We propose an improvement to SN for convolutional111

We use stride 1 and zero-pad with width 1 for convolutions.

layers, called the real spectral normalization (realSN), to more accurately constrain the network’s Lipschitz constant and thereby enforce Assumption (A).

In realSN, we directly consider the convolutional linear operator , where are input’s height and width, instead of reshaping the convolution kernel into a matrix. The power iteration also requies the conjugate (transpose) operator . It can be shown that is another convolutional operator with a kernel that is a rotated version of the forward convolutional kernel; the first two dimensions are permuted and the last two dimensions are rotated by 180 degrees (Liu et al., 2019). Instead of two vectors as in SN, realSN maintains and to estimate the leading left and right singular vectors respectively. During each forward pass of the neural network, realSN conducts:

  1. Apply one step of the power method with operator :

  2. Normalize the convolutional kernel with estimated spectral norm:

By replacing with , realSN can constrain the Lipschitz constant to any upper bound . Using the highly efficient convolution computation in modern deep learning frameworks, realSN can be implemented simply and efficiently. Specifically, realSN introduces three additional one-sample convolution operations for each layer in each training step. When we used a batch size of , the extra computational cost of the additional operations is mild.

(a) BM3D
(b) SimpleCNN
(c) RealSN-SimpleCNN
(d) DnCNN
(e) RealSN-DnCNN
Figure 1: Histograms for experimentally verifying Assumption (A). The x-axis represents values of and the y-axis represents the frequency. The vertical red bar corresponds to the maximum value.

4.3 Implementation details

We refer to SimpleCNN and DnCNN regularized by realSN as RealSN-SimpleCNN and RealSN-DnCNN, respectively. We train them in the setting of Gaussian denoising with known fixed noise levels . We used for CS-MRI and single photon imaging, and for Poisson denoising. The regularized denoisers are trained to have Lipschitz constant (no more than) 1. The training data consists of images from the BSD500 dataset, divided into patches. The CNN weights were initialized in the same way as (Zhang et al., 2017a)

. We train all networks using the ADAM optimizer for 50 epochs, with a mini-batch size of 128. The learning rate was

in the first 25 epochs, then decreased to . On an Nvidia GTX 1080 Ti, DnCNN took 4.08 hours and realSN-DnCNN took 5.17 hours to train, so the added cost of realSN is mild.

5 Poisson denoising: validating the theory

Consider the Poisson denoising problem, where given a true image

, we observe independent Poisson random variables

, so , for . For details and motivation for this problem setup, see (Rond et al., 2016).

For the objective function , we use the negative log-likelihood given by , where

We can compute elementwise with

The gradient of is given by for for . We set when , although, strictly speaking, is undefined when and . This does not seem to cause any problems in the experiments. Since we force the denoisers to output nonnegative pixel values, PNP-FBS never needs to evaluate for negative .

For , we choose BM3D, SimpleCNN with and without realSN, and DnCNN with and without realSN. Note that these denoisers are designed or trained for the purpose of Gaussian denoising, and here we integrate them into the PnP frameworks for Poisson denoising. We scale the image so that the peak value of the image, the maximum mean of the Poisson random variables, is . The -variable was initialized to the noisy image for PnP-FBS and PnP-ADMM, and the -variable was initialized to for PnP-ADMM. We use the test set of 13 images in (Chan et al., 2017).


We first examine which denoisers satisfy Assumption (A) with small . In Figure 1, we run PnP iterations of Poisson denoising on a single image (flag of (Rond et al., 2016)) with different models, calculate between the iterates and the limit, and plot the histogram. The maximum value of a histogram, marked by a vertical red bar, lower-bounds the of Assumption (A). Remember that Corollary 3 requires to ensure convergence of PnP-ADMM. Figure 1(a) proves that BM3D violates this assumption. Figures 1(b) and 1(c) and Figures 1(d) and 1(e) respectively illustrate that RealSN indeed improves (reduces) for SimpleCNN and DnCNN.

Figure 2 experimentally validates Theorems 1 and 2

, by examining the average (geometric mean) contraction factor (defined in Section

3) of PnP-FBS and ADMM222We compute the contraction factor of the equivalent PnP-DRS. iterations over a range of step sizes. Figure 2 qualitatively shows that PnP-ADMM exhibits more stable convergence than PnP-FBS. Theorem 1 ensures PnP-FBS is a contraction when is within an interval and Theorem 2 ensures PnP-ADMM is a contraction when is large enough. We roughly observe this behavior for the denoisers trained with RealSN.

BM3D RealSN-DnCNN RealSN-SimpleCNN
PNP-ADMM 23.4617 23.5873 18.7890
PNP-FBS 18.5835 22.2154 22.7280
Table 1: Average PSNR performance (in dB) on Poisson denoising (peak ) on the testing set in (Chan et al., 2017)

Empirical performance.

Our theory only concerns convergence and says nothing about the recovery performance of the output the methods converge to. We empirically verify that the PnP methods with RealSN, for which we analyzed convergence, yield competitive denoising results.

We fix for all denoisers in PNP-ADMM, and in PNP-FBS. For deep learning-based denoisers, we choose . For BM3D, we choose as suggested in (Rond et al., 2016) and use .

Table 1 compares the PnP methods with BM3D, RealSN-DnCNN, and RealSN-SimpleCNN plugged in. In both PnP methods, one of the two denoisers using RealSN, for which we have theory, outperforms BM3D. It is interesting to obverse that the PnP performance does not necessarily hinge on the strength of the plugged in denoiser and that different PnP methods favor different denoisers. For example, RealSN-SimpleCNN surpasses the much more sophisticated RealSN-DnCNN under PnP-FBS. However, RealSN-DnCNN leads to better, and overall best, denoising performance when plugged into PnP-ADMM.

(a) PnP-FBS
Figure 2: Average contraction factor of 500 iterations for the Poisson denoising experiment. The x-axis represents the value of and y-axis represents the contraction factor. Although lower means faster convergence, a smoother curve means the method is easier to tune and has more stable convergence.

6 More applications

We now apply PnP on two imaging problemsand show that RealSN improves the reconstruction of PnP.333Code for our experiments in Sections 5 and 6 is available at

Single photon imaging.

Consider single photon imaging with quanta image sensors (QIS) (Fossum, 2011; Chan & Lu, 2014; Elgendy & Chan, 2016) with the model

where is the underlying image, duplicates each pixel to pixels, is sensor gain, is the oversampling rate, is the observed binary photons. We want to recover from . The likelihood function is

where is the number of ones in the -th unit pixel, is the number of zeros in the -th unit pixel. The gradient of is given by and the proximal operator of is given in (Chan & Lu, 2014).

We compare PnP-ADMM and PnP-FBS respectively with the denoisers BM3D, RealSN-DnCNN, and RealSN-SimpleCNN. We take . The -variable was initialized to for PnP-FBS and PnP-ADMM, and the -variable was initialized to for PnP-ADMM. All deep denoisers used in this experiment were trained with fixed noise level . We report the PSNRs achieved at the 50th iteration, the 100th iteration, and the best PSNR values achieved within the first 100 iterations.

Table 2 reports the average PSNR results on the 13 images used in (Chan et al., 2017). Experiments indicate that PnP-ADMM methods constantly yields higher PNSR than the PnP-FBS counterparts using the same denoiser. The best overall PSNR is achieved using PnP-ADMM with RealSN-DnCNN, which shows nearly 1dB improvement over the result obtained with BM3D. We also observe that deep denoisers with RealSN make PnP converges more stably.

Average PSNR BM3D RealSN- RealSN-
Iteration 50 30.0034 31.0032 29.2154
Iteration 100 30.0014 31.0032 29.2151
Best Overall 30.0474 31.0431 29.2155
Average PSNR BM3D RealSN- RealSN-
Iteration 50 28.7933 27.9617 29.0062
Iteration 100 29.0510 27.9887 29.0517
Best Overall 29.5327 28.4065 29.3563
Table 2: Average PSNR (in dB) of single photon imaging task on the test set of (Chan et al., 2017)
Sampling approach Random Radial Cartesian
Image Brain Bust Brain Bust Brain Bust
Zero-filling 9.58 7.00 9.29 6.19 8.65 6.01
TV (Lustig et al., 2005) 16.92 15.31 15.61 14.22 12.77 11.72
RecRF (Yang et al., 2010) 16.98 15.37 16.04 14.65 12.78 11.75
BM3D-MRI (Eksioglu, 2016) 17.31 13.90 16.95 13.72 14.43 12.35
PnP-FBS BM3D 19.09 16.36 18.10 15.67 14.37 12.99
DnCNN 19.59 16.49 18.92 15.99 14.76 14.09
RealSN-DnCNN 19.82 16.60 18.96 16.09 14.82 14.25
SimpleCNN 15.58 12.19 15.06 12.02 12.78 10.80
RealSN-SimpleCNN 17.65 14.98 16.52 14.26 13.02 11.49
PnP-ADMM BM3D 19.61 17.23 18.94 16.70 14.91 13.98
DnCNN 19.86 17.05 19.00 16.64 14.86 14.14
RealSN-DnCNN 19.91 17.09 19.08 16.68 15.11 14.16
SimpleCNN 16.68 12.56 16.83 13.47 13.03 11.17
RealSN-SimpleCNN 17.77 14.89 17.00 14.47 12.73 11.88
Table 3: CS-MRI results (30% sample with additive Gaussian noise ) in PSNR (dB).

Compressed sensing MRI.

Magnetic resonance imaging (MRI) is a widely-used imaging technique with a slow data acquisition. Compressed sensing MRI (CS-MRI) accelerates MRI by acquiring less data through downsampling. PnP is useful in medical imaging as we do not have a large amount of data for end-to-end training: we train the denoiser on natural images, and then “plug” it into the PnP framework to be applied to medical images. CS-MRI is described mathematically as

where is the underlying image, is the linear measurement model, is the measured data, and is measurement noise. We want to recover from . The objective function is

The gradient of is given in (Liu et al., 2016) and the proximal operator of is given in (Eksioglu, 2016). We use BM3D, SimpleCNN and DnCNN, and their variants by RealSN for the PnP denoiser .

We take as the Fourier k-domain subsampling (partial Fourier operator). We tested random, radial, and Cartesian sampling (Eksioglu, 2016) with a sampling rate of . The noise level is taken as .

We compare PnP frameworks with zero-filling, total-variation (TV) (Lustig et al., 2005), RecRF (Yang et al., 2010), and BM3D-MRI (Eksioglu, 2016) 444Some recent deep-learning based methods (Yang et al., 2016; Kulkarni et al., 2016; Metzler et al., 2017; Zhang & Ghanem, 2018) are not compared here because we assume we do not have enough medical images for training.. The parameters are taken as follows. For TV, the regularization parameter is taken as the best one from . For RecRF, the two parameters are both taken from the above sets and the best results are reported. For BM3D-MRI, we set the “final noise level (the noise level in the last iteration)” as , which is suggested in their MATLAB library. For PnP methods with as BM3D, we set , take and report the best results. For PNP-ADMM with as deep denoisers, we take and uniformly for all the cases. For PNP-FBS with as deep denoisers, we take and uniformly. All deep denoisers are trained on BSD500 (Martin et al., 2001), a natural image data set; no medical image is used in training. The -variable was initialized to the zero-filled solution for PnP-FBS and PnP-ADMM, and the -variable was initialized to for PnP-ADMM. Table 3 reports our results on CS-MRI, from which we can confirm the effectiveness of PnP frameworks. Moreover, using RealSN-DnCNN seems to the clear winner over all. We also observe that PnP-ADMM generally outperforms PnP-FBS when using the same denoiser, which supports Theorems 1 and 2.

7 Conclusion

In this work, we analyzed the convergence of PnP-FBS and PnP-ADMM under a Lipschitz assumption on the denoiser. We then presented real spectral normalization a technique to enforce the proposed Lipschitz condition in training deep learning-based denoisers. Finally, we validate the theory with experiments.


We thank Pontus Giselsson for the discussion on negatively averaged operators and Stanley Chan for the discussion on the difficulties in establishing convergence of PnP methods. This work was partially supported by National Key R&D Program of China 2017YFB02029, AFOSR MURI FA9550-18-1-0502, NSF DMS-1720237, ONR N0001417121, and NSF RI-1755701.


  • Bailion et al. (1978) Bailion, J. B., Bruck, R. E., and Reich, S. On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces. Houston Journal of Mathematics, 4(1), 1978.
  • Bansal et al. (2018) Bansal, N., Chen, X., and Wang, Z. Can we gain more from orthogonality regularizations in training deep networks? In Advances in Neural Information Processing Systems, pp. 4266–4276, 2018.
  • Bartlett et al. (2017) Bartlett, P. L., Foster, D. J., and Telgarsky, M. J. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pp. 6240–6249, 2017.
  • Bauschke & Combettes (2017) Bauschke, H. H. and Combettes, P. L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer New York, 2nd edition, 2017.
  • Brifman et al. (2016) Brifman, A., Romano, Y., and Elad, M. Turning a denoiser into a super-resolver using plug and play priors. 2016 IEEE International Conference on Image Processing, pp. 1404–1408, 2016.
  • Brock et al. (2019) Brock, A., Donahue, J., and Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.
  • Buades et al. (2005) Buades, A., Coll, B., and Morel, J.-M. A non-local algorithm for image denoising. In

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    , 2005.
  • Buzzard et al. (2018) Buzzard, G. T., Chan, S. H., Sreehari, S., and Bouman, C. A. Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium. SIAM Journal on Imaging Sciences, 11(3):2001–2020, 2018.
  • Chan (2019) Chan, S. H. Performance analysis of Plug-and-Play ADMM: A graph signal processing perspective. IEEE Transactions on Computational Imaging, 2019.
  • Chan & Lu (2014) Chan, S. H. and Lu, Y. M. Efficient image reconstruction for gigapixel quantum image sensors. In Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on, pp. 312–316. IEEE, 2014.
  • Chan et al. (2017) Chan, S. H., Wang, X., and Elgendy, O. A. Plug-and-play ADMM for image restoration: Fixed-point convergence and applications. IEEE Transactions on Computational Imaging, 3(1):84–98, 2017.
  • Chang et al. (2017) Chang, J. R., Li, C.-L., Poczos, B., and Kumar, B. V. One network to solve them all—solving linear inverse problems using deep projection models. In 2017 IEEE International Conference on Computer Vision, pp. 5889–5898. IEEE, 2017.
  • Chen et al. (2018) Chen, X., Liu, J., Wang, Z., and Yin, W. Theoretical linear convergence of unfolded ista and its practical weights and thresholds. In Advances in Neural Information Processing Systems, pp. 9061–9071, 2018.
  • Combettes & Yamada (2015) Combettes, P. L. and Yamada, I. Compositions and convex combinations of averaged nonexpansive operators. Journal of Mathematical Analysis and Applications, 425(1):55–70, 2015.
  • Dabov et al. (2007) Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095, 2007.
  • Danielyan et al. (2010) Danielyan, A., Katkovnik, V., and Egiazarian, K. Image deblurring by augmented Lagrangian with BM3D frame prior. In Workshop on Information Theoretic Methods in Science and Engineering, pp. 16–18, 2010.
  • Danielyan et al. (2012) Danielyan, A., Katkovnik, V., and Egiazarian, K. BM3D frames and variational image deblurring. IEEE Transactions on Image Processing, 21(4):1715–1728, 2012.
  • Dar et al. (2016) Dar, Y., Bruckstein, A. M., Elad, M., and Giryes, R. Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing, 25(7):3044–3058, 2016.
  • Dong et al. (2018) Dong, W., Wang, P., Yin, W., and Shi, G. Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
  • Egiazarian & Katkovnik (2015) Egiazarian, K. and Katkovnik, V. Single image super-resolution via BM3D sparse coding. In 2015 23rd European Signal Processing Conference, pp. 2849–2853, 2015.
  • Eksioglu (2016) Eksioglu, E. M. Decoupled algorithm for mri reconstruction using nonlocal block matching model: BM3D-MRI. Journal of Mathematical Imaging and Vision, 56(3):430–440, 2016.
  • Elgendy & Chan (2016) Elgendy, O. A. and Chan, S. H. Image reconstruction and threshold design for quanta image sensors. In 2016 IEEE International Conference on Image Processing, pp. 978–982. IEEE, 2016.
  • Fan et al. (2019) Fan, K., Wei, Q., Wang, W., Chakraborty, A., and Heller, K. InverseNet: Solving inverse problems with splitting networks. IEEE International Conference on Multimedia and Expo, 2019.
  • Fletcher et al. (2018) Fletcher, A. K., Pandit, P., Rangan, S., Sarkar, S., and Schniter, P. Plug-in estimation in high-dimensional linear inverse problems: A rigorous analysis. In Advances in Neural Information Processing Systems 31, pp. 7451–7460. 2018.
  • Fossum (2011) Fossum, E. The quanta image sensor (QIS): concepts and challenges. In Imaging Systems and Applications. Optical Society of America, 2011.
  • Giselsson (2017) Giselsson, P. Tight global linear convergence rate bounds for Douglas–Rachford splitting. Journal of Fixed Point Theory and Applications, 19(4):2241–2270, 2017.
  • Gouk et al. (2018) Gouk, H., Frank, E., Pfahringer, B., and Cree, M. Regularisation of neural networks by enforcing Lipschitz continuity. arXiv preprint arXiv:1804.04368, 2018.
  • He et al. (2018) He, J., Yang, Y., Wang, Y., Zeng, D., Bian, Z., Zhang, H., Sun, J., Xu, Z., and Ma, J. Optimizing a parameterized plug-and-play ADMM for iterative low-dose CT reconstruction. IEEE Transactions on Medical Imaging, pp. 1–13, 2018.
  • He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
  • Heide et al. (2014) Heide, F., Steinberger, M., Tsai, Y.-T., Rouf, M., Pajak, D., Reddy, D., Gallo, O., Liu, J., Heidrich, W., Egiazarian, K., Kautz, J., and Pulli, K. FlexISP: A flexible camera image processing framework. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia 2014), 33(6), 2014.
  • Ioffe & Szegedy (2015) Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In

    Proceedings of the 32nd International Conference on Machine Learning

    , volume 37, pp. 448–456, 2015.
  • Kamilov et al. (2017) Kamilov, U. S., Mansour, H., and Wohlberg, B. A plug-and-play priors approach for solving nonlinear imaging inverse problems. IEEE Signal Processing Letters, 24(12):1872–1876, 2017.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
  • Kulkarni et al. (2016) Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., and Ashok, A. Reconnet: Non-iterative reconstruction of images from compressively sensed measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 449–458, 2016.
  • Liu et al. (2019) Liu, J., Chen, X., Wang, Z., and Yin, W. ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA. ICLR, 2019.
  • Liu et al. (2018) Liu, P., Zhang, H., Zhang, K., Lin, L., and Zuo, W. Multi-level Wavelet-CNN for image restoration. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 886–895, 2018.
  • Liu et al. (2016) Liu, Y., Zhan, Z., Cai, J.-F., Guo, D., Chen, Z., and Qu, X. Projected iterative soft-thresholding algorithm for tight frames in compressed sensing magnetic resonance imaging. IEEE transactions on medical imaging, 35(9):2130–2140, 2016.
  • Lustig et al. (2005) Lustig, M., Santos, J. M., Lee, J.-H., Donoho, D. L., and Pauly, J. M. Application of compressed sensing for rapid MR imaging. SPARS,(Rennes, France), 2005.
  • Lyu et al. (2019) Lyu, Q., Ruan, D., Hoffman, J., Neph, R., McNitt-Gray, M., and Sheng, K. Iterative reconstruction for low dose ct using Plug-and-Play alternating direction method of multipliers (ADMM) framework. Proceedings of SPIE, 10949, 2019.
  • Mao et al. (2016) Mao, X., Shen, C., and Yang, Y.-B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems, pp. 2802–2810, 2016.
  • Martin et al. (2001) Martin, D., Fowlkes, C., Tal, D., and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th International Conference on Computer Vision, volume 2, pp. 416–423, July 2001.
  • Meinhardt et al. (2017) Meinhardt, T., Moeller, M., Hazirbas, C., and Cremers, D. Learning proximal operators: Using denoising networks for regularizing inverse imaging problems. In 2017 International Conference on Computer Vision, pp. 1799–1808, 2017.
  • Metzler et al. (2017) Metzler, C., Mousavi, A., and Baraniuk, R. Learned d-amp: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems, pp. 1772–1783, 2017.
  • Metzler et al. (2016) Metzler, C. A., Maleki, A., and Baraniuk, R. G. From denoising to compressed sensing. IEEE Transactions on Information Theory, 62(9):5117–5144, 2016.
  • Miyato et al. (2018) Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
  • Moreau (1965) Moreau, J. J. Proximité et dualité dans un espace Hilbertien. Bulletin de la Société Mathématique de France, 93:273–299, 1965.
  • Oberman & Calder (2018) Oberman, A. M. and Calder, J. Lipschitz regularized deep neural networks converge and generalize. arXiv preprint arXiv:1808.09540, 2018.
  • Ogura & Yamada (2002) Ogura, N. and Yamada, I. Non-strictly convex minimization over the fixed point set of an asymptotically shrinking nonexpansive mapping. Numerical Functional Analysis and Optimization, 23(1-2):113–137, 2002.
  • Ono (2017) Ono, S. Primal-dual plug-and-play image restoration. IEEE Signal Processing Letters, 24(8):1108–1112, 2017.
  • Plötz & Roth (2018) Plötz, T. and Roth, S. Neural nearest neighbors networks. In Advances in Neural Information Processing Systems, pp. 1095–1106, 2018.
  • Polyak (1987) Polyak, B. T. Introduction to Optimization. Optimization Software Inc., New York, 1987.
  • Qian & Wegman (2019) Qian, H. and Wegman, M. N. L2-nonexpansive neural networks. In International Conference on Learning Representations, 2019.
  • Reehorst & Schniter (2019) Reehorst, E. T. and Schniter, P. Regularization by denoising: Clarifications and new interpretations. IEEE Transactions on Computational Imaging, 5(1):52–67, 2019.
  • Romano et al. (2017) Romano, Y., Elad, M., and Milanfar, P. The little engine that could: Regularization by denoising (RED). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017.
  • Rond et al. (2016) Rond, A., Giryes, R., and Elad, M. Poisson inverse problems by the plug-and-play scheme. Journal of Visual Communication and Image Representation, 41:96–108, 2016.
  • Rudin et al. (1992) Rudin, L. I., Osher, S., and Fatemi, E. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
  • Ryu & Boyd (2016) Ryu, E. K. and Boyd, S. Primer on monotone operator methods. Applied and Computational Mathematics, 15:3–43, 2016.
  • Shi & Feng (2018) Shi, M. and Feng, L. Plug-and-play prior based on Gaussian mixture model learning for image restoration in sensor network. IEEE Access, 6:78113–78122, 2018.
  • Sreehari et al. (2016) Sreehari, S., Venkatakrishnan, S. V., Wohlberg, B., Buzzard, G. T., Drummy, L. F., Simmons, J. P., and Bouman, C. A. Plug-and-play priors for bright field electron tomography and sparse interpolation. IEEE Transactions on Computational Imaging, 2(4):408–423, 2016.
  • Sreehari et al. (2017) Sreehari, S., Venkatakrishnan, S. V., Bouman, K. L., Simmons, J. P., Drummy, L. F., and Bouman, C. A. Multi-resolution data fusion for super-resolution electron microscopy. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017.
  • Sun et al. (2018a) Sun, Y., Wohlberg, B., and Kamilov, U. S. Plug-in stochastic gradient method. arXiv preprint arXiv:1811.03659, 2018a.
  • Sun et al. (2018b) Sun, Y., Xu, S., Li, Y., Tian, L., Wohlberg, B., and Kamilov, U. S. Regularized Fourier ptychography using an online plug-and-play algorithm. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018b.
  • Sun et al. (2019) Sun, Y., Wohlberg, B., and Kamilov, U. S. An online plug-and-play algorithm for regularized image reconstruction. IEEE Transactions on Computational Imaging, 2019.
  • Taylor et al. (2018) Taylor, A. B., Hendrickx, J. M., and Glineur, F. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 2018.
  • Teodoro et al. (2016) Teodoro, A. M., Bioucas-Dias, J. M., and Figueiredo, M. A. T. Image restoration and reconstruction using variable splitting and class-adapted image priors. IEEE International Conference on Image Processing, 2016.
  • Teodoro et al. (2017) Teodoro, A. M., Bioucas-Dias, J. M., and Figueiredo, M. A. T. Scene-adapted plug-and-play algorithm with convergence guarantees. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing, pp. 1–6, 2017.
  • Teodoro et al. (2019) Teodoro, A. M., Bioucas-Dias, J. M., and Figueiredo, M. A. T. A convergent image fusion algorithm using scene-adapted Gaussian-mixture-based denoising. IEEE Transactions on Image Processing, 28(1):451–463, 2019.
  • Tirer & Giryes (2019) Tirer, T. and Giryes, R. Image restoration by iterative denoising and backward projections. IEEE Transactions on Image Processing, 28(3):1220–1234, 2019.
  • Tsuzuku et al. (2018) Tsuzuku, Y., Sato, I., and Sugiyama, M. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pp. 6541–6550, 2018.
  • Venkatakrishnan et al. (2013) Venkatakrishnan, S. V., Bouman, C. A., and Wohlberg, B. Plug-and-play priors for model based reconstruction. 2013 IEEE Global Conference on Signal and Information Processing, pp. 945–948, 2013.
  • Wang & Chan (2017) Wang, X. and Chan, S. H. Parameter-free Plug-and-Play ADMM for image restoration. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1323–1327. 2017.
  • Weng et al. (2018) Weng, T.-W., Zhang, H., Chen, P.-Y., Yi, J., Su, D., Gao, Y., Hsieh, C.-J., and Daniel, L. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations, 2018.
  • Yang et al. (2010) Yang, J., Zhang, Y., and Yin, W. A fast alternating direction method for TVL1-L2 signal reconstruction from partial Fourier data. IEEE Journal of Selected Topics in Signal Processing, 4(2):288–297, 2010.
  • Yang et al. (2016) Yang, Y., Sun, J., Li, H., and Xu, Z. Deep ADMM-Net for compressive sensing MRI. In Advances in Neural Information Processing Systems, pp. 10–18, 2016.
  • Ye et al. (2018) Ye, D. H., Srivastava, S., Thibault, J., Sauer, K., and Bouman, C. Deep residual learning for model-based iterative CT reconstruction using plug-and-play framework. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6668–6672, 2018.
  • Zhang & Ghanem (2018) Zhang, J. and Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  • Zhang et al. (2017a) Zhang, K., Zuo, W., Chen, Y., Meng, D., and Zhang, L. Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017a.
  • Zhang et al. (2017b) Zhang, K., Zuo, W., Gu, S., and Zhang, L. Learning deep CNN denoiser prior for image restoration. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2808–2817, 2017b.
  • Zhang et al. (2018) Zhang, K., Zuo, W., and Zhang, L. FFDNet: Toward a fast and flexible solution for cnn based image denoising. IEEE Transactions on Image Processing, 27(9):4608–4622, 2018.
  • Zoran & Weiss (2011) Zoran, D. and Weiss, Y. From learning models of natural image patches to whole image restoration. In 2011 International Conference on Computer Vision, pp. 479–486, 2011.

8 Preliminaries

For any , write for the inner product. We say a function is convex if

for any and . A convex function is closed if it is lower semi-continuous and proper if it is finite somwhere. We say is -strongly convex for if is a convex function. Given a convex function and , define its proximal operator as

When is convex, closed, and proper, the uniquely exists, and therefore is well-defined. An mapping is -Lipschitz if

for all . If is -Lipschitz with , we say is nonexpansive. If is -Lipschitz with , we say is a contraction. A mapping is -averaged for , if it is nonexpansive and if

where is another nonexpansive mapping.

Lemma 4 (Proposition 4.35 of (Bauschke & Combettes, 2017)).

is -averaged if and only if

for all .

Lemma 5 ((Ogura & Yamada, 2002; Combettes & Yamada, 2015)).

Assume and are and -averaged, respectively. Then is -averaged.

Lemma 6.

Let . is -averaged if and only if is -averaged.


The lemma follows from the fact that

for some nonexpansive and that nonexpansiveness of and implies nonexpansivenes of . ∎

Lemma 7 ((Taylor et al., 2018)).

Assume is -strongly convex and is -Lipschitz. Then for any , we have

Lemma 8 (Proposition 5.4 of (Giselsson, 2017)).

Assume is -strongly convex, closed, and proper. Then

is -averaged.


The notion of proximal operator and its well-definedness were first presented in (Moreau, 1965). The notion of averaged mappings were first introduced in (Bailion et al., 1978). The idea of Lemma 6 relates to “negatively averaged” operators from (Giselsson, 2017). Lemma 7 is proved in a weaker form as Theorem 3 of (Polyak, 1987) and in Section 5.1 of (Ryu & Boyd, 2016). Lemma 7 as stated is proved as Theorem 2.1 in (Taylor et al., 2018).

9 Proofs of main results

9.1 Equivalence of PNP-DRS and PNP-ADMM

We show the standard steps that establish equivalence of PNP-DRS and PNP-ADMM. Starting from PNP-DRS, we substitute to get

We reorder the iterations to get the correct dependency

We label and

and we get PNP-ADMM.

9.2 Convergence analysis

Lemma 9.

satisfies Assumption (A) if and only if

is nonexpansive and -averaged.


Define , which means . Clearly, . Define , which means . Then

Remember that Assumption (A) corresponds to for all . This is equivalent to for all , which corresponds to being -averaged by Lemma 4. ∎

Lemma 10.

satisfies Assumption (A) if and only if

is nonexpansive and -averaged.


Define , which means . Clearly, . Define , which means . Then

Remember that Assumption (A) corresponds to for all . This is equivalent to for all , which corresponds to being -averaged by Lemma 4. ∎

Proof of Theorem 1.

In general, if operators and are and -Lipschitz, then the composition is -Lipschitz. By Lemma 7, is -Lipschitz. By Lemma 9, is -Lipschitz. The first part of the theorem following from composing the Lipschitz constants. The restrictions on and follow from basic algebra. ∎

Proof of Theorem 2.

By Lemma 8,

is -averaged, and this implies

is also -averaged, by Lemma 6. By Lemma 10,

is -averaged. Therefore,

is -averaged by Lemma 5, and this implies

is also -averaged, by Lemma 6.

Using the definition of averagedness, we can write