 # Plug-and-play ISTA converges with kernel denoisers

Plug-and-play (PnP) method is a recent paradigm for image regularization, where the proximal operator (associated with some given regularizer) in an iterative algorithm is replaced with a powerful denoiser. Algorithmically, this involves repeated inversion (of the forward model) and denoising until convergence. Remarkably, PnP regularization produces promising results for several restoration applications. However, a fundamental question in this regard is the theoretical convergence of the PnP iterations, since the algorithm is not strictly derived from an optimization framework. This question has been investigated in recent works, but there are still many unresolved problems. For example, it is not known if convergence can be guaranteed if we use generic kernel denoisers (e.g. nonlocal means) within the ISTA framework (PnP-ISTA). We prove that, under reasonable assumptions, fixed-point convergence of PnP-ISTA is indeed guaranteed for linear inverse problems such as deblurring, inpainting and superresolution (the assumptions are verifiable for inpainting). We compare our theoretical findings with existing results, validate them numerically, and explain their practical relevance.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Restoration problems such as denoising, deblurring, inpainting, and superresolution  are often modeled as optimization problems of the form

 minx∈Rn  f(x)+λg(x), (1)

where is the image variable, is the data term involving the forward model, and is the regularizer (possibly induced by some ground-truth prior). For Gaussian denoising, , where is the noisy image, is the clean image, and is white Gaussian noise. The minimizer of (1) in this case (i.e. the denoised image) is technically referred to as the proximal map of at :

 proxλg(y)=argminx∈Rn  12∥x−y∥22+λg(x). (2)

Different choices of result in different denoising schemes, such as total-variation and wavelet denoising . However, not all denoising methods are originally conceived as proximal maps. For example, powerful denoisers such as NLM  and BM3D  are based on patch-based filtering, while DnCNN 

uses deep neural networks. These denoisers are typically more powerful than total-variation and wavelet denoisers that are derived from a regularization framework. A natural question thus is can we exploit the excellent denoising capability of NLM and BM3D for regularization purpose? One such approach leads to the so-called Plug-and-Play (PnP) method

[7, 8]. This is based on the observation that most standard algorithms [9, 10, 11, 12] for solving (1) such as ISTA and ADMM involve the iterative inversion of the forward model followed by the application of (2). Based on the formal equivalence of the proximal map with Gaussian denoising, the authors in [7, 8] choose to formally replace (2) with NLM or BM3D denoiser. The resulting algorithm essentially involves repeated inversion and denoising. Empirically, PnP methods have been shown to yield excellent results for tomography [8, 13], superresolution , single-photon imaging , fusion , inpainting , and several other applications [17, 18, 19, 20, 21, 22, 23, 24].

One straightforward way of establishing theoretical convergence of a PnP method is to express the associated denoiser as the proximal map of some regularizer. Indeed, this the case with symmetrized NLM and Gaussian mixture denoisers [8, 15], which can be expressed as the proximal map of a convex regularizer . The PnP iterations amount to minimizing (1) in this case and convergence follows from standard optimization theory as a result. However, establishing convergence for NLM, BM3D or neural networks is difficult, since it is not known if these denoisers can be expressed as proximal maps. Several recent works have addressed the question of theoretical convergence of PnP methods [8, 14, 13, 25, 15, 16, 26]. Fixed-point convergence has been established in [14, 26], but the underlying assumptions are often strong or not verifiable. For example, the denoiser is assumed to be bounded in a certain sense  which is difficult to check even for NLM, and is assumed to be strongly convex  which excludes inpainting and superresolution. We also note that convergence properties of the PnP iterates have been investigated in [13, 16, 25].

In a different direction, theoretical convergence was established in , although using a modified form of NLM called DSG-NLM (whose cost is about three times of NLM ). More precisely, DSG-NLM can be expressed as a linear map , where is doubly stochastic, symmetric, and positive semidefinite. Using these properties, it was shown that can be expressed as a proximal map, which settled the convergence question for DSG-NLM. However, this and the other papers do not address the question if convergence can be an issue with original NLM. The present work was motivated by the above and the following related questions:

(i) Can fixed-point convergence be established when is standard NLM or a generic kernel denoiser  whose cost is about a third of that than DSG-NLM? That is, can the symmetry and doubly-stochastic assumptions be removed?
(ii) Can the strong convexity assumption  on be removed (which excludes inpainting and superresolution)?
(iii) Can convergence be established under assumptions that are practically verifiable?

Contributions. We address the above questions when ISTA [10, 29] is the base algorithm, i.e., when the proximal map in ISTA is replaced by a denoiser. The resulting PnP method (PnP-ISTA) has been investigated in [19, 13, 26]. In this paper, we establish the following results which are novel to the best of our knowledge:
(i) We prove that under reasonable assumptions, PnP-ISTA exhibits fixed-point convergence for linear inverse problems using standard NLM as the denoiser (Theorem 4).

(ii) We prove that these assumptions are met for inpainting (Proposition 3), and we explicitly compute a bound on the step size in PnP-ISTA that can guarantee convergence (Theorem 5).

Organization. The problem statement is described in §II. We state and explain our convergence results, along with the required assumptions, in §III and defer the technical proofs to §VI. The results are further discussed and experimentally validated using inpainting experiments in §IV. Finally, we summarize our findings in §V.

## Ii Problem statement

Our objective is to investigate the convergence of PnP-ISTA with generic kernel denoisers, and NLM in particular. An advantage of ISTA over other iterative algorithms such as ADMM is that the ISTA update rule [10, 29] is simple and involves just one variable. Applied to (1), the rule is given by

 xk+1=proxγg(xk−γ∇f(xk)),(k⩾0) (3)

where is the initial guess and is the step size. We now describe the general settings of the problem that we study in terms of the denoiser used and the data term .

We assume that the denoiser used in PnP-ISTA is a kernel denoiser  such as NLM . Such a denoiser can be expressed as linear map , where is row-stochastic (i.e. each row of sums to ), but not symmetric. Recall that NLM denoises an image by performing weighted averaging of pixels in a square window around pixel . We will refer to as the denoising window around . The weight matrix

is derived from a symmetric kernel (affinity) matrix

as , where is the diagonal matrix of the row-sums of . In NLM, is computed from image patches and a large window is allowed .

In general, may be computed from an image which is different from the input image. A common practice in PnP methods is to use the image from the previous iteration to compute for the current iteration. This is done in the first few iterations, and then is frozen for the rest of the iterations [8, 27] . Hence, after a finite number of iterations, the denoiser acts as a linear map . We adopt this practice in PnP-ISTA.

Moreover, we focus on linear inverse problems, where and is the observed image. This class subsumes a wide variety of problems; in particular inpainting, deblurring, and superresolution [14, 16]. Substituting in (3) and replacing by , we can express the PnP-ISTA update as:

 xk+1=Pxk+q, (4)

where , and is the identity matrix. In other words, the evolution of the iterates is governed by a linear dynamical system. Note that and depend on .

Our aim is to provide sufficient conditions under which the sequence generated via (4) converges, irrespective of the initialization . This is referred to as fixed-point convergence to distinguish it from other forms of convergence  such as objective convergence [8, 15]. In particular, we wish to guarantee that the PnP iterates do not diverge or oscillate. Note that if , then it follows from (4) that (from the proof in Section VI, it follows that is invertible since

is not an eigenvalue of

). Since and depend on , the limit point will change with the step size . We next state and discuss our results.

## Iii Convergence Results

A possible line of attack is to use the contraction mapping principle . We write (4) as where , and note that for , where is the Euclidean norm on and is the spectral norm of

(largest singular value). Convergence of

would follow if we can show that , since would be a contraction map in this case. However, note that Hence, to ensure we must have . However, in applications such as inpainting where has a non-trivial null space, is an eigenvalue of , whereby is an eigenvalue of . As a result, . To ensure , we must therefore make . However, since is row-stochastic, one of its eigenvalues is , so that . As a result, we cannot use the contraction argument.

Fortunately, there exists a weaker condition on which implies convergence of . In particular, motivated by the resemblance of (4) to a linear dynamical system, we use a standard condition that guarantees the stability of such a system. This is in terms of the spectral radius of :

 ϱ(P)=max{|μ|:μ is an eigenvalue of P}.
###### Theorem 1.

Let . Then generated by (4) converges from any arbitrary initialization .

A proof of this standard result can be found in [31, Chapter 7]. In other words, if we can ensure that , then PnP-ISTA is guaranteed to converge. Note that since and are fixed, the step size is the only parameter we can use to manipulate . Therefore, the hope is that can be made less than by controlling . While bounding the spectral radius of seems more difficult than bounding its norm, we will prove that this is possible provided we make a set of assumptions as given below. We define , so that we can write .

###### Assumption 2.

The following conditions hold:

1. [label=(),ref=2()]

2. For , and iff .

3. For , .

4. For ,

 ∑j∉ΩiQij<∑j∈ΩiQij. (5)

We have used to denote the th element of . Assumption 1 holds for NLM, where the weights are strictly positive within the denoising window. Assumption 2 holds in the case of inpainting (where the rows of are a subset of the rows of with ), deblurring (where is the circulant matrix corresponding to a nonnegative point spread function ), and superresolution (where is the product of nonnegative subsampling and blurring matrices ). Assumption 3 states that the sum of within the denoising neighborhood is strictly larger that those outside . Although this assumption is a little difficult to interpret, it can in principle be tested based on and . For the special case of inpainting, we can in fact prove that the condition always holds under a mild assumption (see Section VI for the proof).

###### Proposition 3.

Suppose Assumption 1 holds and is the subsampling (decimation) matrix. Moreover, suppose that for every pixel , the denoising window contains at least one observed pixel. Then Assumption 3 holds.

The requirement in Proposition 3 is not very strict, at least for uniform sampling. Indeed, we can simply increase until we hit an observed pixel (recall that is usually large in NLM). In other words, Assumption 3 is verifiable for the inpainting problem, in the sense that it can be proved to hold. We are now ready to state our main convergence result.

###### Theorem 4.

Suppose Assumption 2 holds. Then there exists such that PnP-ISTA converges for any initialization provided .

The proof is given in Section VI, and is based on Theorem 1. An immediate implication of Theorem 4 combined with Proposition 3 is that for inpainting, PnP-ISTA converges if the step size is sufficiently small. However, Theorem 4 only affirms the existence of an upper bound; it does not provide a way to compute the bound. Hence in practice, we can choose a sufficiently small . However, as stated below, the bound can be explicitly computed for inpainting (see Section VI for the proof). We note that in [13, Proposition 2], it is shown that the mean of the squares of the first residuals in PnP-ISTA decays at rate. This is different from Theorem 4 in which the convergence of is established.

###### Theorem 5.

For the inpainting problem, PnP-ISTA (4) converges for any arbitrary initialization if .

## Iv Discussion

Theorem 5 rigorously establishes the fixed-point convergence of PnP-ISTA for inpainting using the NLM denoiser if we choose a step size . To validate this finding, we perform an inpainting experiment in Figure 1 in which just pixels have been observed. The initialization is computed by applying a median filter on the input image . We have used NLM in which the weights are computed from , and we set . The reconstruction in Figure 1(c) is found to be reasonably accurate. As expected, for , we observe that the PSNR between the ground-truth and the iterate stabilizes as increases (cf. Figure 2(a)). Moreover, the residuals go to as the algorithm progresses (Figure 2(b)). This corroborates with the theoretical result in Theorem 5. In fact, the algorithm is empirically found to stabilize even for and . (Note that Theorem 5 does not tell us what happens if .) But it fails to stabilize for . The corresponding residuals also tends to diverge (see inset in Figure 2(b)).

An interesting exercise is to connect the result in Theorem 5 with the convergence of standard ISTA for the inpainting problem. For convex problems, convergence of ISTA is guaranteed under a couple of assumptions : (i) is Lipschitz continuous with a Lipschitz constant , and (ii) . For inpainting, the former holds with , since the largest eigenvalue of is . Thus, the objective in (1) converges to its minimum if ; but proving fixed-point convergence of the iterates is technically challenging. Note that Theorem 5 proves convergence of the PnP-ISTA iterates using kernel denoisers under the same condition on . On the other hand, the upper bound on the step size in ISTA can be relaxed from to while still ensuring convergence  (although this no longer ensures a monotonic reduction of the objective). Based on the observation in Figure 2 that PnP-ISTA stabilizes even for , we conjecture that a similar relaxation of the bound might be possible for PnP-ISTA.

Finally, we note that although Assumption 3 is difficult to verify for inverse problems other than inpainting, it is empirically observed that PnP-ISTA converges for deblurring with . We refer the reader to the supplement for a couple of related experiments.

## V Conclusion

We proved that PnP-ISTA converges for a class of linear problems using kernel denoisers. The central observation is that the problem of convergence can be reduced to that of the stability of a dynamical system. Moreover, the assumptions used in this regard can be verified for inpainting and we can obtain a bound on the step size in this case. To the best of our knowledge, this is the first plug-and-play convergence result for superresolution and inpainting using standard NLM denoiser. We conclude the paper with the following questions:
(i) Similar to Theorem 5, can an upper bound be computed for other linear inverse problems?
(ii) Is it possible to extend the convergence analysis for FISTA  and ADMM ?

## Vi Proofs

In this section, we give the detailed proofs of Proposition 3 and Theorems 4 and 5.

###### Proof of Proposition 3.

Let be the set of observed pixels (). Then is a diagonal matrix whose -th diagonal element is if , and is if . Note that

denotes the linear index of a pixel w.r.t. to its vectorized representation, whereas

are the spatial locations. Therefore the elements of are given by if , and if . Thus, for , the left side of (5) becomes (since for by Assumption 1), whereas the right side becomes

 ∑j∈Ωi∩ΓQij  +∑j∈Ωi∩ΓcQij  =∑j∈Ωi∩ΓWij.

The sum on the right is positive since for (by Assumption 1), and is non-empty by the hypothesis of Proposition 3. Hence, (5) holds. ∎

The proof of Theorem 4 uses Gershgorin’s disk theorem [31, Chapter 7]. We state this result as applied to matrix . For , the Gershgorin disk is defined to be the disk in the complex plane with center and radius , i.e. is the sum of the unsigned non-diagonal entries in the th row. That is, . Gershgorin’s theorem states that every eigenvalue of is contained in at least one of the Gershgorin disks .

###### Proof of Theorem 4.

We will show that there exists such that all the Gershgorin disks of are contained in the interior of the unit disk in (disk centered at 0 with radius ), provided that . This will imply that by Gershgorin’s theorem, and hence convergence of by Theorem 1.

Since is real, the center of each disk is on the real axis. Hence, it is enough to show that and , i.e. ; see Figure 3. In this regard, we will prove the existence of such that, for all , the following hold: (1) if ; (2) if . Recall that is a function of . The theorem follows immediately if we set .

To prove the first result, we note that

 Ri−Pii=∑j:j≠i|Wij−γQij|−(Wii−γQii).

At , this has the value , which is strictly less than since (by Assumption 1 and the row-stochasticity of ). Moreover, is a continuous function of (note that denotes absolute value of real numbers in the above equation). As a result, by continuity, there exists such that for . The result now follows by letting . Fig. 3: Illustration for the proof of Theorem 4 which relies on Gershgorin’s theorem . Every Gershgorin disk of P (shaded) is required to be in the interior of the unit circle in the complex plane. Since Pii is real, this is equivalent to the conditions Pii+Ri<1 and Pii−Ri>−1.

To prove the second result, we note that

 Pii +Ri=Wii−γQii+∑j:j≠i|Wij−γQij| =Wii−γQii+∑j≠i,Wij=0γQij+∑j≠i,Wij>0|Wij−γQij|, (6)

where we used the fact that (by Assumptions 1 and 2) in the first summation to drop the absolute value symbols. Moreover, we will show that we can also drop the absolute value symbols in the last summation in (6). This is because the last summation is over the set . Since the map is continuous, for every in this set there exists such that if . Let . Since is the minimum of a finite set of positive numbers, it is positive. As a result, by setting , we can drop the absolute values in the last summation in (6). Thus, for , we obtain that

 Pii+Ri =(Wii+∑j∈Ωi,j≠iWij)+γ∑j∉Ωi,j≠iQij−γ(Qii+∑j∈Ωi,j≠iQij) =1+γ(∑j∉ΩiQij−∑j∈ΩiQij), (7)

where we have used Assumption 1 and the fact that that the row-sum is , to simplify the terms. From (7), we see that is a linear function of for , with slope

 ∑j∉ΩiQij−∑j∈ΩiQij.

By Assumption 3, this slope is negative. Hence, is strictly decreasing in for . Moreover, the value of at is . Hence, for . The result now follows by letting . ∎

###### Proof of Theorem 5.

This is similar to the proof of Theorem 4. However, as stated in the proof of Proposition 3, is either or for inpainting. Using this, it is not difficult to verify that we can take and in the proof of Theorem 4. As a result, we get , so that . ∎

## References

•  B. K. Gunturk and X. Li, Image Restoration: Fundamentals and Advances.   CRC Press, 2012.
•  R. T. Rockafellar, Moreau’s proximal mappings and convexity in Hamilton-Jacobi theory.   Springer, 2006.
•  M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “Fast image recovery using variable splitting and constrained optimization,” IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2345–2356, 2010.
•  A. Buades, B. Coll, and J. M. Morel, “A non-local algorithm for image denoising,”

Proc. IEEE Computer Vision and Pattern Recognition

, vol. 2, pp. 60–65, 2005.
•  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
•  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017.
•  S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” Proc. IEEE Global Conference on Signal and Information Processing, pp. 945–948, 2013.
• 

S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plug-and-play priors for bright field electron tomography and sparse interpolation,”

IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016.
•  P.-L. Lions and B. Mercier, “Splitting algorithms for the sum of two nonlinear operators,” SIAM Journal on Numerical Analysis, vol. 16, no. 6, pp. 964–979, 1979.
•  I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413–1457, 2004.
•  A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, no. 1, pp. 120–145, 2011.
•  S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,”

Foundations and Trends in Machine Learning

, vol. 3, no. 1, pp. 1–122, 2011.
•  Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” IEEE Transactions on Computational Imaging, vol. 5, no. 3, pp. 395–408, 2019.
•  S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017.
•  A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “A convergent image fusion algorithm using scene-adapted Gaussian-mixture-based denoising,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 451–463, 2019.
•  T. Tirer and R. Giryes, “Image restoration by iterative denoising and backward projections,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1220–1234, 2019.
•  S. Ono, “Primal-dual plug-and-play image restoration,” IEEE Signal Processing Letters, vol. 24, no. 8, pp. 1108–1112, 2017.
•  K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser prior for image restoration,” Proc. IEEE Computer Vision and Pattern Recognition, pp. 3929–3938, 2017.
•  T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” Proc. IEEE International Conference on Computer Vision, pp. 1781–1790, 2017.
•  G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018.
•  S. H. Chan, “Performance analysis of plug-and-play ADMM: A graph signal processing perspective,” IEEE Transactions on Computational Imaging, vol. 5, no. 2, pp. 274–286, 2019.
•  A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “Image restoration and reconstruction using targeted plug-and-play priors,” IEEE Transactions on Computational Imaging, vol. 5, no. 4, pp. 675–686, 2019.
•  U. S. Kamilov, H. Mansour, and B. Wohlberg, “A plug-and-play priors approach for solving nonlinear imaging inverse problems,” IEEE Signal Processing Letters, vol. 24, no. 12, pp. 1872–1876, 2017.
•  P. Nair, V. S. Unni, and K. N. Chaudhury, “Hyperspectral image fusion using fast high-dimensional denoising,” Proc. IEEE International Conference on Image Processing, pp. 3123–3127, 2019.
•  W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network for image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 10, pp. 2305–2318, 2018.
•  E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” Proc. International Conference on Machine Learning, vol. 97, pp. 5546–5557, 2019.
•  V. S. Unni, S. Ghosh, and K. N. Chaudhury, “Linearized ADMM and fast nonlocal denoising for efficient plug-and-play restoration,” Proc. IEEE Global Conference on Signal and Information Processing, pp. 11–15, 2018.
•  P. Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 106–128, 2013.
•  M. A. Figueiredo and R. D. Nowak, “An EM algorithm for wavelet-based image restoration,” IEEE Transactions on Image Processing, vol. 12, no. 8, pp. 906–916, 2003.
•  A. Granas and J. Dugundji, Fixed Point Theory.   Springer Science & Business Media, 2013.
•  C. D. Meyer, Matrix Analysis and Applied Linear Algebra.   SIAM, 2000, vol. 71.
•  A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
•  I. W. Selesnick and I. Bayram, “Total variation filtering,” Tech. Rep., 2010.