On the Proof of Fixed-Point Convergence for Plug-and-Play ADMM

10/31/2019 ∙ by Ruturaj G. Gavaskar, et al. ∙ 0

In most state-of-the-art image restoration methods, the sum of a data-fidelity and a regularization term is optimized using an iterative algorithm such as ADMM (alternating direction method of multipliers). In recent years, the possibility of using denoisers for regularization has been explored in several works. A popular approach is to formally replace the proximal operator within the ADMM framework with some powerful denoiser. However, since most state-of-the-art denoisers cannot be posed as a proximal operator, one cannot guarantee the convergence of these so-called plug-and-play (PnP) algorithms. In fact, the theoretical convergence of PnP algorithms is an active research topic. In this letter, we consider the result of Chan et al. (IEEE TCI, 2017), where fixed-point convergence of an ADMM-based PnP algorithm was established for a class of denoisers. We argue that the original proof is incomplete, since convergence is not analyzed for one of the three possible cases outlined in the paper. Moreover, we explain why the argument for the other cases does not apply in this case. We give a different analysis to fill this gap, which firmly establishes the original convergence theorem.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

A variety of image restoration problems, such as superresolution, deblurring, compressed sensing, tomography etc., are modeled as optimization problems of the form

(1)

where the data-fidelity term is derived from the degradation and noise models, while the regularizer is derived from some prior on the ground-truth image [1]. Traditionally, the regularizer is a sparsity-promoting function in some transform domain [2]. In recent years, researchers have explored the possibility of using powerful Gaussian denoisers such as NLM [3] and BM3D [4] for regularization purpose. In [5, 6], the regularizer is explicitly constructed from a denoiser. On the other hand, for plug-and-play (PnP) methods [7, 8, 9, 10], the denoiser is formally substituted in place of the proximal operator in iterative algorithms such as FISTA [11], primal-dual splitting [12], and ADMM [13].

The focus of this work is on an ADMM-based PnP method [14]. We recall that the ADMM based solution of (1) involves the following steps [13]:

(2)
(3)
(4)

where is a penalty parameter and is the Euclidean norm (this is the rescaled form of ADMM). If and are convex, then under some technical conditions, the iterates are guaranteed to converge to a fixed-point, which is the global minimizer of (1). Now, (3) corresponds to regularized Gaussian denoising, where assumes the role of the regularizer [15]. Based on this observation, the original proposal in [7] was to replace the proximal operation (3) with an off-the-shelf denoiser, i.e., the -update is replaced by , where is the denoiser in question. The idea is simply to exploit the excellent denoising capability of state-of-the-art denoisers for restoration, even though we might not be able to conceive them as proximal operators (of some regularizer). We refer the readers to [7, 8] for a detailed account. The technical challenge, however, is that the resulting sequence of operations, referred to as PnP-ADMM, need not necessarily correspond to an optimization problem. As a result, the convergence of the iterates is at stake. In particular, we can no longer relate to the optimization in (1) and use existing results [13] to ensure convergence. Nevertheless, PnP-ADMM is often found to converge empirically and yields high-quality reconstructions in several applications [7, 8, 14, 16]. Among other things, questions relating to the convergence and optimality of PnP-type methods have been studied in recent works. In [8], convergence guarantees were derived for a kernel-based denoiser for PnP-ADMM. Later, it was shown in [14] that the convergence can be ensured for a broad class of denoisers. Apart from ADMM, PnP algorithms based on various iterative methods have been explored in [9, 10, 17, 18, 19, 20]. We note that denoisers have also been used for regularization purpose in [21, 22, 23, 24, 25, 26, 27]. The relation of PnP-ADMM with graph Laplacian-based regularization was investigated in [28], whereas in [29] a framework motivated by PnP, called Consensus Equilibrium, was proposed.

In this letter, we revisit the proof of convergence of the PnP-ADMM algorithm in [14] and address an inadequacy therein. It was proved that, under suitable assumptions, the sequence of iterates generated by this algorithm converges to a fixed-point, for any arbitrary initialization . Instead of a fixed , an adaptive is used in [14], which plays an important role in the proof. However, this necessitates the use of a case-by-case approach conditioned on the adaptation rule (see Section II for details). Of the three cases considered in the paper, convergence was proved for the first two cases. It was claimed that convergence for the third case automatically follows from that of the first two cases. However, we argue that this is generally not true and hence a separate proof is needed for the third case. We give such a proof, which differs from the proof for the first two cases in [14]. In particular, we show that the difference between successive iterates is bounded by a piecewise geometric sequence, as opposed to a geometric sequence for the first two cases. We prove that this sequence is summable, which is used to show that the iterates form a Cauchy sequence (and is hence convergent).

The remaining letter is as follows. In Section II, we review the algorithm in [14]. In particular, we discuss the convergence result and explain why the proof is incomplete. The outline of the proof is provided in Section III, while the technical details are deferred to Section IV.

Ii Background

As mentioned earlier, the updates in PnP-ADMM [14] are modeled on the ADMM updates (2)–(4), with the following changes: a denoiser is used in the -update, and is updated in each iteration. In particular, the updates are given by

(5)
(6)
(7)

where . Here is a denoising operator, where the parameter controls its denoising action. It was proposed to update based on the residual

(8)

where the metric on is defined as

, and the three components of and

are vectors in

. Thus, (8) is simply the distance between the -th and -th iterates, which measures the progress made by the algorithm. The exact rule proposed in [14] is as follows:

(9)

where and are predefined parameters. The above rule, in effect, decreases the denoising strength ( is increased) if the ratio of the current and previous residuals is greater than ; else, is kept unchanged (see [14] for details).

It was claimed in [14] that the iterates generated by (5)–(7) converge to a fixed point if a couple of assumptions are met. The first concerns the data-fidelity term.

Assumption 1.

The function is differentiable and there exists such that for all .

The second assumption concerns the denoiser.

Assumption 2.

There exists such that, for all ,

(10)

While discussions on the above assumptions can be found in [14], here we reiterate a couple of remarks about Assumption 2. It is difficult to mathematically verify (10) even for simple denoisers, let alone sophisticated ones such as BM3D. However, an implication of (10) is that the denoiser acts like an identity map (idle filter) when is close to zero. It is reasonable to expect that any practical denoiser obeys this weaker condition. Moreover, while the denoiser might not perfectly behave as an identity operator when is close to zero, it is possible to artificially force this behavior.

We are now ready to state the convergence result in [14].

Theorem 3.

Under Assumptions 1 and 2, the iterates converge to a fixed point.

In particular, the iterates do not diverge or oscillate. We note that convergence of implies that as . However, the converse is generally not true, i.e., it is possible that converges to but do not converge. The technical point is that must vanish sufficiently fast to guarantee the convergence of . This is used in [14] as well as the present analysis.

To set up the technical context, we briefly recall the arguments provided in [14] in support of Theorem 3. First, Assumptions 1 and 2 were used to obtain the following result; see [14, Appendix B, Lemma 1].

Lemma 4.

If condition in (9) holds at iteration , then for some .

Now, note that exactly one of the following cases must hold:

Condition holds for finitely many .

Condition holds for finitely many .

Both and hold for infinitely many .

In [14], convergence was established for and as follows. Suppose is true, and let be the largest when holds, i.e., is true for . Then it follows from (9) that increases monotonically: for . Using Lemma 4, we can thus conclude that

(11)

Similarly, for , let be the largest when holds, so that holds for . By recursively applying the condition in (9), we then obtain

(12)
(13)

where the second inequality follows from Lemma 4. In summary, for both and , we can find a sufficiently large and such that

(14)

where . Namely, the error between successive iterates is eventually upper-bounded by a decaying geometric sequence. Using the triangle inequality, the fast convergence of can be used to show that the original sequence is Cauchy, and hence convergent (since the ambient space is complete). This establishes the convergence of for the first two cases.

It was stated in [14] that is a “union of and ”, and that convergence under and implies convergence for . However, this is not true simply because the proof sketched above is valid only if one of or occurs finitely many times—this naturally excludes the case where both and occur infinitely often. For example, consider the hypothetical situation in which occurs for every even and

occurs for every odd

. Clearly, the proof does not work in this case.

For further clarity, let us carefully examine the technique in [14] used to establish convergence for and . For , the eventual bound on was established using the fact that is monotonically increasing for . A similar bound for was derived using the second inequality in (9), which holds for . Thus, in both (11) and (12), the existence of a finite (or ) is vital because it allows us to ignore the first few terms of the sequence , and understand its behavior over the tail. In turn, this is possible because condition (or ) occurs only a finite number of times. If both and occur infinitely often, we cannot find a finite beyond which a single inequality holds for . This is precisely why the technique in [14] is not applicable for . Before proceeding further, we note why it is important to prove convergence in the case . Theorem 3 assures us that the algorithm converges regardless of which of the three cases hold. Therefore its proof remains incomplete unless convergence is proved for all three cases (and in particular ). Moreover, experiments suggest that is indeed likely to arise in certain practical scenarios. We have reported some empirical observations for deblurring and superresolution experiments in the supplementary material to back this. In these experiments, we found that when is close to , it is likely that holds, i.e., the algorithm keeps switching between conditions and .

Iii Main Result

We will now establish the convergence of for case . In particular, we will show that can be bounded by a sequence which vanishes sufficiently fast to ensure that is Cauchy. Such a sequence is defined next.

Definition 5.

A positive sequence is said to be a piecewise geometric sequence (PGS) if there exists and indices such that

  • for , the terms are in geometric progression with rate , i.e., for ,

  • the subsequence is in geometric progression with rate , i.e., for ,

In other words, a PGS can be divided into chunks that are in geometric progression (with identical rates). Moreover, the subsequence consisting of the peaks (i.e., the first term in each chunk) is itself in geometric progression. A PGS has a sawtooth-like appearance (see Figure 1), and is slower to decay to zero compared to a geometric sequence having the same rate. It turns out that the sequence of residues can be bounded by a PGS for case .

Lemma 6.

Let be the residuals for case . Then there exists a PGS such that for all .

This may be considered as an analogue of (14) for . To deduce that is Cauchy, it suffices to show that a PGS is summable.

Fig. 1: A piecewise geometric sequence . The red points are the peaks, which themselves form a geometric sequence (cf. Definition 5).
Lemma 7.

If is a PGS, then converges.

The proof of Lemma 6 and 7 is somewhat technical and is deferred to Section IV. Importantly, using the above lemmas, we can establish the convergence of for case .

Proposition 8.

The iterates for case converge to a fixed point.

Proof.

As noted earlier, all we need to show is that is a Cauchy sequence. That is, for any given , we can find an integer such that whenever . Now, from the triangle inequality for metric and (8), we have

From Lemma 6 and 7, we can conclude that converges. This is because is bounded by the PGS , whose series itself converges. In particular, the partial sums of form a Cauchy sequence. As a result, for any , we can find a sufficiently large such that when . ∎

Along with the proofs for cases and already provided in [14], Proposition 8 completes the proof of Theorem 3.

Iv Proofs

We now give the proofs for Lemma 6 and 7. For integers , we will use to denote the set .

Iv-a Proof of Lemma 6

Let and . Note that . We will show that is bounded by a PGS with rate .

Let be the iteration at which condition holds for the first time. Further, let be the iteration at which condition occurs for the first time after (i.e. holds at iterations ). Let be the iteration at which holds for the first time after , and so on. Since holds, both and are true infinitely often. This gives us an infinite sequence of indices . Now, by construction, for each , holds at iterations . Hence, from (9), for . Since this trivially also holds for ,

By Lemma 4, for , we have

Letting , this becomes

(15)

We now derive a relation between ’s for different . We know that . However, from (9) we get,

since Case 2 occurs at iterations . This gives

since and . Applying the above inequality recursively and using the fact , we get

Let . Hence from (15), for ,

(16)

Now, condition holds for . Hence by recursively applying (9), we obtain for . Note that this trivially also holds for . Hence, we have for ,

(17)

where we have used (16) with and the fact that .

Combining (16) and (17) we get

In summary, we conclude that is upper-bounded by the sequence defined by

for and . This does not specify the first terms of ; we may arbitrarily choose them to be equal to the corresponding terms of . It follows from Definition 5 that is indeed a PGS with rate .

Iv-B Proof of Lemma 7

Let the parameters , be as in Definition 5. We will prove the convergence of using the Cauchy criterion. Let be given. We need to find an index such that

(18)

Let , and fix an integer such that

(19)

This is possible since and the right side of (19) is positive. We will prove that (18) is satisfied by .

First, for fixed , we derive a bound on the sum of the terms from to . From Definition 5, we have

(20)

The inequality in the third step holds since , while the last equality follows from Definition 5.

We are now ready to establish (18). Let be such that . Suppose lies in the chunk for some . Then . As a result,

where the inequality in the third step follows from (20) and the last inequality follows from (19). Therefore, satisfies the Cauchy criterion, where is defined by (19). This completes the proof.

V Conclusion

We pointed out that the proof of convergence of the PnP-ADMM algorithm in [14] does not address a certain case. We reasoned that this case needs to be handled differently from the cases addressed in [14]. This is because the approach in [14] fundamentally assumes that a certain condition holds a finite number of times, which is not true for the case in question. In particular, we showed that unlike the geometric sequences used for the other cases, we need to work with a piecewise geometric sequence. Our proof of convergence follows from the observation that the residue between successive iterations is upper-bounded by this summable sequence. Our analysis rigorously establishes the convergence theorem in [14].

We note that in practice, optimization algorithms, including PnP-ADMM, are terminated after a finite number of iterations. In particular, since the cases in the convergence analysis involve infinite number of iterations, which of these hold in practice cannot be ascertained empirically. Therefore, getting a guarantee on theoretical convergence has practical importance—it provides a mathematical justification to terminate the algorithm after a sufficiently many iterations. This is precisely what was accomplished in this letter.

References

  • [1] B. K. Gunturk and X. Li, Image Restoration: Fundamentals and Advances.   CRC Press, 2012.
  • [2] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in signal priors,” Inverse Problems, vol. 23, no. 3, p. 947, 2007.
  • [3] A. Buades, B. Coll, and J. M. Morel, “A non-local algorithm for image denoising,”

    Proc. IEEE Conference on Computer Vision and Pattern Recognition

    , vol. 2, pp. 60–65, 2005.
  • [4] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
  • [5] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (RED),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017.
  • [6] E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Transactions on Computational Imaging, vol. 5, no. 1, pp. 52–67, 2019.
  • [7] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” Proc. IEEE Global Conference on Signal and Information Processing, pp. 945–948, 2013.
  • [8]

    S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plug-and-play priors for bright field electron tomography and sparse interpolation,”

    IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016.
  • [9] S. Ono, “Primal-dual plug-and-play image restoration,” IEEE Signal Processing Letters, vol. 24, no. 8, pp. 1108–1112, 2017.
  • [10] U. S. Kamilov, H. Mansour, and B. Wohlberg, “A plug-and-play priors approach for solving nonlinear imaging inverse problems,” IEEE Signal Processing Letters, vol. 24, no. 12, pp. 1872–1876, 2017.
  • [11] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
  • [12] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, no. 1, pp. 120–145, 2011.
  • [13] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,”

    Foundations and Trends in Machine learning

    , vol. 3, no. 1, pp. 1–122, 2011.
  • [14] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017.
  • [15] B. R. Hunt, “Bayesian methods in nonlinear digital image restoration,” IEEE Transactions on Computers, vol. C-26, no. 3, pp. 219–229, 1977.
  • [16] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “A convergent image fusion algorithm using scene-adapted gaussian-mixture-based denoising,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 451–463, 2019.
  • [17] Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” IEEE Transactions on Computational Imaging, vol. 5, no. 3, pp. 395–408, 2019.
  • [18] T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” Proc. IEEE International Conference on Computer Vision, pp. 1781–1790, 2017.
  • [19] E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 5546–5557, 2019.
  • [20]

    W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network for image restoration,”

    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 10, pp. 2305–2318, 2018.
  • [21] A. Brifman, Y. Romano, and M. Elad, “Turning a denoiser into a super-resolver using plug and play priors,” Proc. IEEE International Conference on Image Processing, pp. 1404–1408, 2016.
  • [22] A. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “Image restoration and reconstruction using targeted plug-and-play priors,” IEEE Transactions on Computational Imaging, 2019.
  • [23] T. Tirer and R. Giryes, “Image restoration by iterative denoising and backward projections,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1220–1234, 2019.
  • [24]

    T. Tirer and R. Giryes, “Super-resolution via image-adapted denoising CNNs: Incorporating external and internal learning,”

    IEEE Signal Processing Letters, vol. 26, no. 7, pp. 1080–1084, 2019.
  • [25]

    A. K. Fletcher, P. Pandit, S. Rangan, S. Sarkar, and P. Schniter, “Plug-in estimation in high-dimensional linear inverse problems: A rigorous analysis,”

    Advances in Neural Information Processing Systems, pp. 7440–7449, 2018.
  • [26] Y. Yazaki, Y. Tanaka, and S. H. Chan, “Interpolation and denoising of graph signals using plug-and-play ADMM,” Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5431–5435, 2019.
  • [27] R. Chen, D. Zhai, X. Liu, and D. Zhao, “Noise-aware super-resolution of depth maps via graph-based plug-and-play framework,” Proc. IEEE International Conference on Image Processing, pp. 2536–2540, 2018.
  • [28] S. H. Chan, “Performance analysis of plug-and-play ADMM: A graph signal processing perspective,” IEEE Transactions on Computational Imaging, vol. 5, no. 2, pp. 274–286, 2019.
  • [29] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018.

References