# A Fast Stochastic Plug-and-Play ADMM for Imaging Inverse Problems

In this work we propose an efficient stochastic plug-and-play (PnP) algorithm for imaging inverse problems. The PnP stochastic gradient descent methods have been recently proposed and shown improved performance in some imaging applications over standard deterministic PnP methods. However, current stochastic PnP methods need to frequently compute the image denoisers which can be computationally expensive. To overcome this limitation, we propose a new stochastic PnP-ADMM method which is based on introducing stochastic gradient descent inner-loops within an inexact ADMM framework. We provide the theoretical guarantee on the fixed-point convergence for our algorithm under standard assumptions. Our numerical results demonstrate the effectiveness of our approach compared with state-of-the-art PnP methods.

• 10 publications
• 19 publications
10/22/2019

### The Practicality of Stochastic Optimization in Imaging Inverse Problems

In this work we investigate the practicality of stochastic gradient desc...
07/06/2019

### On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

In this work, we analyze the regularizing property of the stochastic gra...
10/07/2021

### Gradient Step Denoiser for convergent Plug-and-Play

Plug-and-Play methods constitute a class of iterative algorithms for ima...
01/16/2022

### On Maximum-a-Posteriori estimation with Plug Play priors and stochastic gradient descent

Bayesian methods to solve imaging inverse problems usually combine an ex...
01/02/2018

### Scene-Adapted Plug-and-Play Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging

The recently proposed plug-and-play (PnP) framework allows leveraging re...
06/05/2020

### Scalable Plug-and-Play ADMM with Convergence Guarantees

Plug-and-play priors (PnP) is a broadly applicable methodology for solvi...
10/21/2020

### A statistical framework for model-based inverse problems in ultrasound elastography

Model-based computational elasticity imaging of tissues can be posed as ...

## I Introduction

Recent trends in the research of computational imaging have been focusing on developing algorithms which are able to jointly utilize the power of classical physical models and advanced image priors [venkatakrishnan2013plug, egiazarian2007compressed, kamilov2017plug, romano2017little, reehorst2018regularization]

. These methods typically take the form of well-known optimization algorithms, and plug in a pretrained deep neural network

[zhang2017beyond] or a patch-based denoiser with non-local denoising properties [buades2005non, dabov2008image, chen2017trainable, talebi2013global]. In this work we propose a novel stochastic plug-and-play method for imaging inverse problems. Consider the following observation model for a linear inverse problem:

 b=Ax†+w,   A∈Rn×d (1)

where

denotes the vectorized (raster) ground truth image,

represents the forward measurement model, denotes the observation, while

represents the random additive noise. Traditionally, in order to get a good estimate of the ground truth

, we typically seek to find the minimizer of a composite objective function:

 x⋆∈argminx∈Rd{f(x)+g(x)}, (2)

where is the data fidelity term which is assumed to be convex and smooth, such as the least-square loss , and here we denote as the i-th row of and as the i-th element of . Meanwhile in (2) denotes a regularization term which encodes image priors, with classical examples including the sparsity-inducing regularization in wavelet domain and the total-variation regularization [chambolle2016introduction]

, etc. The composite loss function (

2) can be effectively minimized via a class of iterative algorithms which are known as the proximal splitting methods [combettes2011proximal], including the forward-backward splitting [lions1979splitting, 2009_Beck_Fast, beck2009fast], primal-dual splitting[chambolle2011first, chambolle2015convergence, pesquet2014class] and the Douglas-Rachfort splitting/alternating direction method of multipliers (ADMM)[douglas1956numerical, boct2015inertial, boyd2011distributed], etc.

Considering the link between proximal operators and denoising, researchers [venkatakrishnan2013plug, egiazarian2007compressed, kamilov2017plug] have discovered that if they simply replace the proximal operator on with a direct call to an off-the-shelf denoising algorithm, such as NLM [buades2005non], TNRD [chen2017trainable], BM3D [dabov2008image], or the DnCNN [zhang2017beyond], excellent image recovery/reconstruction results can often be attained. Although the imaging community has very limited theoretical understanding and convergence analysis for such algorithms so far, these ad-hoc approaches have shown state-of-the-art performances in various imaging applications.

Recently, inspired by the success of stochastic gradient descent (SGD) methods in solving large-scale optimization tasks in machine learning

[bottou2010large, shalev2011pegasos, kingma2014adam] and some imaging applications [chambolle2018stochastic, chouzenoux2017stochastic, ehrhardt2017faster], Sun et al [sun2019online] have extended the deterministic plug-and-play ISTA/FISTA method [kamilov2017plug]

and proposed PnP-SGD in order to improve the computational efficiency. In each iteration of PnP-SGD, a minibatch stochastic gradient is computed as an unbiased estimator of the full gradient, which yields a computational benefit in each iteration. However, as discussed in

[tang2019practicality], current stochastic gradient methods in general need to compute the proximal operator/denoisers more frequently than the deterministic gradient methods within the same amount of gradient evaluations. When the denoiser is computationally expensive, the actual performance benefit of using stochastic gradient techniques may be compromised due to this computational overhead. In this work, we seek to address this issue for stochastic PnP methods and propose a more practical approach for utilizing the power of stochastic gradient techniques to accelerate the deterministic plug-and-play algorithms.

### I-a Main contributions

This paper’s contribution is two-fold:

• We propose an efficient stochastic PnP method which empirically improves upon previous stochastic approach in [sun2019online] by reducing the number of calls on the modern denoisers which are usually a bottleneck for computation. We demonstrate the effectiveness of our approach in X-ray computed tomography (CT) imaging problems.

• We provide a theoretical fixed-point convergence analysis for our stochastic PnP algorithm under standard assumptions.

In a recent work of Sun et al [sun2019online], a stochastic PnP algorithm is proposed for image restoration and reconstruction, which can be written as the following:

 PnP-SGD−Initialize x0,z0=x0 For   k=1,2,...,t ⌊xk=D[zk−1−η⋅▽fSk(zk−1)]zk=xk+αk(xk−xk−1)

where denotes a minibatch stochastic gradient with a randomly subsampled index , chosen uniformly at random from a partitioned index , where and , . The PnP-SGD algorithm is essentially a plug-and-play variant of the stochastic proximal gradient descent [rosasco2014convergence] which is based on the forward-backward splitting [lions1979splitting]. In each iteration of PnP-SGD, a minibatch stochastic gradient estimate is computed:

 ▽fSk(zk)=1m∑i∈Sk▽fi(zk), (3)

where . It first performs a stochastic gradient descent step with a step-size , then a denoising step is computed using an off-the-shelf denoiser [buades2005non, dabov2008image, chen2017trainable, zhang2017beyond] denoted as . Finally a momentum step is performed for empirical convergence acceleration with a momentum parameter as in the FISTA algorithm [2009_Beck_Fast]. The computational benefit of PnP-SGD over its deterministic counterparts (PnP-ISTA/PnP-FISTA [kamilov2017plug]) comes from using an approximation of the full gradient by the minibatch gradient (3) which can be efficiently computed. However, the PnP-SGD does not have the capability to reduce the cost of computing the denoising step – the denoiser has to be called at each iteration. To overcome this computational bottleneck, one plausible approach is to decouple the gradient step and the denoising step via Douglas-Rachford splitting/ADMM instead of the forward-backward splitting. In this work we study and propose a stochastic gradient extension of the PnP-ADMM algorithm111We write the update rule of PnP-ADMM in this paper using the equivalent Douglas-Rachford splitting reformulation [ryu2019plug, Section 9.1] for the simplicity of notation in analysis. [venkatakrishnan2013plug]:

 PnP-ADMM−Initialize x0=z0∈Rd; For   k=0,2,...,t ⎢⎢ ⎢ ⎢ ⎢⎣yk=proxτf[zk]xk+1=D[2yk−zk]zk+1=zk+xk+1−yk.

where in each iteration an exact proximal step on the data-fidelity term is computed with a constant step-size :

 proxτf(⋅)=argminx12∥x−⋅∥22+τf(x). (4)

In classical ADMM the step-size can be any positive constant to ensure convergence. The update rule of the PnP-ADMM can be written as , where is an operator defined as [ryu2019plug]:

 T=12I+12(2D−I)(2proxτf−I). (5)

Our proposed solution, presented in algorithm 1, is to use SGD with momentum to approximately solve the prox step (4) within PnP-ADMM framework. We denote the number of inner-iterations at the -th outerloop as . In each inner-iteration a stochastic gradient descent step is performed with a step-size , and then followed by a momentum step for empirical acceleration. Unlike the PnP-SGD which needs to call the denoiser in every stochastic gradient descent iteration, the proposed method only needs to compute the denoiser once every iterations. Our theoretical analysis is restricted to the case where we set and the parameters and are chosen adaptively in each outer-iteration. However, in practice, a constant step size which is inversely proportional to the Lipschitz constant , , a constant number of inner-iterations , and a FISTA-like momentum parameter [chambolle2015convergence] are suggested for good empirical performance.

## Iii Convergence Analysis

In this section we provide theoretical analysis for our stochastic PnP-ADMM. When we run a stochastic gradient-based innerloop, we are effectively making an approximation of the proximal step , hence we can write our stochastic PnP-ADMM algorithm as the inexact recursion:

 zk+1=T(zk)+εk. (6)

where denotes the approximation error. Now the desired fixed-point convergence can be established for (6), if we make the following standard assumptions as in [ryu2019plug] on the denoiser and the data-fidelity term.

### Iii-a Generic Assumptions

###### A. 1

The denoiser satisfies:

 ∥(D−I)(x)−(D−I)(y)∥2≤β∥x−y∥2, ∀x,y∈Rd, (7)

with .

It is easy to show that A.1 implies a relaxed non-expansiveness condition on which reads and is satisfied for a wide class of modern denoisers such as NLM and properly trained DnCNNs [ryu2019plug].

###### A. 2

is -strongly-convex:

 f(x)−f(y)−⟨▽f(y),x−y⟩≥μ∥x−y∥22, ∀x,y∈Rd, (8)

with . Meanwhile for a given minibatch partition index such that , each is -smooth, such that :

 (9)

The strong-convexity assumption is necessary for our analysis. It seems pessimistic since a number of imaging inverse problems do not have strong-convexity. We believe that the assumption on strong-convexity could be relaxed, e.g. following the ideas from [oymak2017sharp, bolte2007lojasiewicz, liang2017local, pmlr-v70-tang17a, tang2018rest]. On the other hand, one can instead run Algorithm 1 on a regularized objective to manually enforce strong-convexity, which is a classical trick in convex optimization [nesterov2013introductory]. Nevertheless, we believe that relaxing this assumption is an important future direction for the analysis.

### Iii-B Analysis

We first apply an existing convergence result for SGD for establishing the approximation accuracy of the inner-loop:

###### Lemma III.1

Under Assumption A.2, denote that for -th outer-loop of Stochastic PnP-ADMM , and define the following quantities for each outer-iteration :

 σ2k:=Eq[∥τ▽fIq(yk⋆)+yk⋆−zk∥22], (10) ξk:=∥yk⋆−xk∥22,

then if the step size , for all , with , , then we have the approximation error of the proximal step bounded as:

 E∥ykNk−proxτf[zk]∥22≤ε, (11)

where the expectation is taken over the random sampling of the indices within the inner-loop.

Proof. We first observe that the proximal step can be written precisely as a finite-sum optimization problem of the follow form:

 proxτf(zk) =argminx12∥x−zk∥22+τf(x) (12) =argminx1KK∑q=1[τfIq(x)+12∥x−zk∥22],

which is a -strongly-convex objective and each of the element in the sum is -smooth.

According to [needell2014stochastic, Theorem 2.1], if we run SGD (starting at ) with uniform random sampling and a step size , then after iterations, we have .

Now, we are able to prove the fixed-point convergence for the inexact recursion (6), and hence for our proposed method.

###### Theorem III.2

Assume A.1 and A.2 with , denote positive values and such that , and , and the quantities , , , are defined as in Lemma III.1. If we choose the step-size parameters as , , , , we have the following fixed-point convergence for Algorithm 1:

 E∥zk+1−zk∥2→0, (13)

when .

Our main theorem suggests that for the basic form of Algorithm 1 where we choose the momentum , the outerloop step-size , the inner-loop step-size decreasing in and the number of inner-iterations increasing in , Algorithm 1 is guaranteed to converge to a fix point. However our numerical results in section IV suggest that we may set the number of inner-loop and step size to be constant and use FISTA-type of momentum [2009_Beck_Fast, beck2009fast, chambolle2015convergence] for good empirical performance in practice.

### Iii-C Proof for Theorem iii.2

Firstly, due to assumption A.1 we have:

 β∥x−y∥2 ≥∥(D−I)(x)−(D−I)(y)∥2 (14) ≥∥D(x)−D(y)∥2−∥x−y∥2, ∀x,y∈Rd,

hence . Denote , we have:

 ∥εk∥2 =∥zk+1−T(zk)∥2 (15) ≤∥uk∥2+∥D(2ykNk−zk)−D(2ykNk−zk+2uk)∥2 ≤(3+2β)∥uk∥2.

Applying Lemma III.1 gives . Now according to [ryu2019plug, Theorem 2], under assumption A.1 and A.2, we can ensure that:

 ∥T(x)−T(y)∥2≤δ∥x−y∥2, ∀x,y∈Rd, (16)

where . Moreover, if and , then . Hence we have:

 E∥zk+1−zk∥2 =E∥T(zk)−T(zk−1)+εk−εk−1∥2 (17) ≤E∥T(zk)−T(zk−1)∥2 +E∥εk∥2+E∥εk−1∥2 ≤δE∥zk−zk−1∥2+3+2βk+3+2βk−1 ≤δE∥zk−zk−1∥2+15k

If we recursively apply the same argument we will get:

 E∥zk+1−zk∥2≤δkE∥z1−z0∥2+15kk−1∑i=0kδik−i. (18)

Then we use a classic criterion to show the boundedness of series where . For any finite , we have:

 limi→+∞ipvi=limi→+∞δiipkk−i=0, (19)

where we take and , and then:

 k−1∑i=0kδik−i<+∞,    15kk−1∑i=0kδik−i→0. (20)

Hence by taking , we have . Thus finishes the proof for Theorem III.2.

## Iv Numerical Experiments

For our numerical experiments, we choose the X-ray CT imaging as an example since it is know to favor stochastic gradient methods [tang2019practicality]. We compare our algorithm with the state-of-the-art stochastic PnP method with momentum acceleration proposed by Sun et al [sun2019online], as well as the PnP FISTA algorithm [kamilov2017plug]. We use MATLAB R2018a in a machine with 1.6 GB RAM, 1.80 GHz Intel Core i7-8550 CPU.

We first test the compared methods on low-dose CT imaging problems, where low-energy noisy X-ray measurements with are used, which demands strong image priors are used in order to achieve good-quality reconstructions. Meanwhile we also compare these algorithms in sparse-view CT imaging with , where fewer X-ray measurements are taken compared to the number of pixels to be inferenced222The numerical result in this example suggests that empirically the strong-convexity is not needed for the stochastic PnP-ADMM to converge.. For low-dose CT example, we choose the penalized weighted least-squares objective as the data-fidelity term, which is tailored for low-dose CT [wang2006penalized]. For sparse-view CT example, we choose the standard least-squares loss as the data fidelity term. For the randomized methods we partition the data into 10 minibatches. The noisy CT observations are obtained via where the forward operator is implemented using the AIRtools package [hansen2012air]. For our algorithm, we set for all such that in each inner-loop we make exactly one pass of the data, outer-loop step-size , inner-loop step-size , and the momentum parameter as suggested in [chambolle2015convergence].

We choose the BM3D [dabov2008image] with the denoiser-scaling [xu2020boosting] as the denoiser:

 Dγ(x)=1γBM3D(γx), (21)

and maximize the reconstruction performance for each of the compared algorithms via grid-searching the parameter .

We present the numerical results of the algorithms in Figure 1 for low-dose CT inverse problem of size , and in Figure 2 for sparse-view CT imaging task of size . We plot the estimation error to the ground-truth image , against the actual run time as well as the number of datapasses. We can observe that both PnP-SGD and our method are much faster than PnP-FISTA in terms of datapass. The PnP-SGD appears to be faster than our method in terms of number of datapasses. However, in terms of actual run time, the PnP-SGD is slower than our method, due to the need to compute the costly BM3D at each stochastic gradient iteration.

## V Conclusion

In this work we propose a stochastic PnP-ADMM algorithm which is able to provide practical acceleration with stochastic gradient techniques, for efficiently solving imaging inverse problems. This is an effective approach to make the stochastic PnP schemes truly practical, by reducing the computational overhead of the modern denoisers. We provide a fixed-point convergence analysis, and demonstrate the effectiveness of our method in numerical experiments.

## Acknowledgment

This work is supported by ERC Advanced grant 694888, C-SENSE.