I Introduction
Recent trends in the research of computational imaging have been focusing on developing algorithms which are able to jointly utilize the power of classical physical models and advanced image priors [venkatakrishnan2013plug, egiazarian2007compressed, kamilov2017plug, romano2017little, reehorst2018regularization]
. These methods typically take the form of wellknown optimization algorithms, and plug in a pretrained deep neural network
[zhang2017beyond] or a patchbased denoiser with nonlocal denoising properties [buades2005non, dabov2008image, chen2017trainable, talebi2013global]. In this work we propose a novel stochastic plugandplay method for imaging inverse problems. Consider the following observation model for a linear inverse problem:(1) 
where
denotes the vectorized (raster) ground truth image,
represents the forward measurement model, denotes the observation, whilerepresents the random additive noise. Traditionally, in order to get a good estimate of the ground truth
, we typically seek to find the minimizer of a composite objective function:(2) 
where is the data fidelity term which is assumed to be convex and smooth, such as the leastsquare loss , and here we denote as the ith row of and as the ith element of . Meanwhile in (2) denotes a regularization term which encodes image priors, with classical examples including the sparsityinducing regularization in wavelet domain and the totalvariation regularization [chambolle2016introduction]
, etc. The composite loss function (
2) can be effectively minimized via a class of iterative algorithms which are known as the proximal splitting methods [combettes2011proximal], including the forwardbackward splitting [lions1979splitting, 2009_Beck_Fast, beck2009fast], primaldual splitting[chambolle2011first, chambolle2015convergence, pesquet2014class] and the DouglasRachfort splitting/alternating direction method of multipliers (ADMM)[douglas1956numerical, boct2015inertial, boyd2011distributed], etc.Considering the link between proximal operators and denoising, researchers [venkatakrishnan2013plug, egiazarian2007compressed, kamilov2017plug] have discovered that if they simply replace the proximal operator on with a direct call to an offtheshelf denoising algorithm, such as NLM [buades2005non], TNRD [chen2017trainable], BM3D [dabov2008image], or the DnCNN [zhang2017beyond], excellent image recovery/reconstruction results can often be attained. Although the imaging community has very limited theoretical understanding and convergence analysis for such algorithms so far, these adhoc approaches have shown stateoftheart performances in various imaging applications.
Recently, inspired by the success of stochastic gradient descent (SGD) methods in solving largescale optimization tasks in machine learning
[bottou2010large, shalev2011pegasos, kingma2014adam] and some imaging applications [chambolle2018stochastic, chouzenoux2017stochastic, ehrhardt2017faster], Sun et al [sun2019online] have extended the deterministic plugandplay ISTA/FISTA method [kamilov2017plug]and proposed PnPSGD in order to improve the computational efficiency. In each iteration of PnPSGD, a minibatch stochastic gradient is computed as an unbiased estimator of the full gradient, which yields a computational benefit in each iteration. However, as discussed in
[tang2019practicality], current stochastic gradient methods in general need to compute the proximal operator/denoisers more frequently than the deterministic gradient methods within the same amount of gradient evaluations. When the denoiser is computationally expensive, the actual performance benefit of using stochastic gradient techniques may be compromised due to this computational overhead. In this work, we seek to address this issue for stochastic PnP methods and propose a more practical approach for utilizing the power of stochastic gradient techniques to accelerate the deterministic plugandplay algorithms.Ia Main contributions
This paper’s contribution is twofold:

We propose an efficient stochastic PnP method which empirically improves upon previous stochastic approach in [sun2019online] by reducing the number of calls on the modern denoisers which are usually a bottleneck for computation. We demonstrate the effectiveness of our approach in Xray computed tomography (CT) imaging problems.

We provide a theoretical fixedpoint convergence analysis for our stochastic PnP algorithm under standard assumptions.
Ii Stochastic PnPADMM
In a recent work of Sun et al [sun2019online], a stochastic PnP algorithm is proposed for image restoration and reconstruction, which can be written as the following:
where denotes a minibatch stochastic gradient with a randomly subsampled index , chosen uniformly at random from a partitioned index , where and , . The PnPSGD algorithm is essentially a plugandplay variant of the stochastic proximal gradient descent [rosasco2014convergence] which is based on the forwardbackward splitting [lions1979splitting]. In each iteration of PnPSGD, a minibatch stochastic gradient estimate is computed:
(3) 
where . It first performs a stochastic gradient descent step with a stepsize , then a denoising step is computed using an offtheshelf denoiser [buades2005non, dabov2008image, chen2017trainable, zhang2017beyond] denoted as . Finally a momentum step is performed for empirical convergence acceleration with a momentum parameter as in the FISTA algorithm [2009_Beck_Fast]. The computational benefit of PnPSGD over its deterministic counterparts (PnPISTA/PnPFISTA [kamilov2017plug]) comes from using an approximation of the full gradient by the minibatch gradient (3) which can be efficiently computed. However, the PnPSGD does not have the capability to reduce the cost of computing the denoising step – the denoiser has to be called at each iteration. To overcome this computational bottleneck, one plausible approach is to decouple the gradient step and the denoising step via DouglasRachford splitting/ADMM instead of the forwardbackward splitting. In this work we study and propose a stochastic gradient extension of the PnPADMM algorithm^{1}^{1}1We write the update rule of PnPADMM in this paper using the equivalent DouglasRachford splitting reformulation [ryu2019plug, Section 9.1] for the simplicity of notation in analysis. [venkatakrishnan2013plug]:
where in each iteration an exact proximal step on the datafidelity term is computed with a constant stepsize :
(4) 
In classical ADMM the stepsize can be any positive constant to ensure convergence. The update rule of the PnPADMM can be written as , where is an operator defined as [ryu2019plug]:
(5) 
Our proposed solution, presented in algorithm 1, is to use SGD with momentum to approximately solve the prox step (4) within PnPADMM framework. We denote the number of inneriterations at the th outerloop as . In each inneriteration a stochastic gradient descent step is performed with a stepsize , and then followed by a momentum step for empirical acceleration. Unlike the PnPSGD which needs to call the denoiser in every stochastic gradient descent iteration, the proposed method only needs to compute the denoiser once every iterations. Our theoretical analysis is restricted to the case where we set and the parameters and are chosen adaptively in each outeriteration. However, in practice, a constant step size which is inversely proportional to the Lipschitz constant , , a constant number of inneriterations , and a FISTAlike momentum parameter [chambolle2015convergence] are suggested for good empirical performance.
Iii Convergence Analysis
In this section we provide theoretical analysis for our stochastic PnPADMM. When we run a stochastic gradientbased innerloop, we are effectively making an approximation of the proximal step , hence we can write our stochastic PnPADMM algorithm as the inexact recursion:
(6) 
where denotes the approximation error. Now the desired fixedpoint convergence can be established for (6), if we make the following standard assumptions as in [ryu2019plug] on the denoiser and the datafidelity term.
Iiia Generic Assumptions
A. 1
The denoiser satisfies:
(7) 
with .
It is easy to show that A.1 implies a relaxed nonexpansiveness condition on which reads and is satisfied for a wide class of modern denoisers such as NLM and properly trained DnCNNs [ryu2019plug].
A. 2
is stronglyconvex:
(8) 
with . Meanwhile for a given minibatch partition index such that , each is smooth, such that :
(9) 
The strongconvexity assumption is necessary for our analysis. It seems pessimistic since a number of imaging inverse problems do not have strongconvexity. We believe that the assumption on strongconvexity could be relaxed, e.g. following the ideas from [oymak2017sharp, bolte2007lojasiewicz, liang2017local, pmlrv70tang17a, tang2018rest]. On the other hand, one can instead run Algorithm 1 on a regularized objective to manually enforce strongconvexity, which is a classical trick in convex optimization [nesterov2013introductory]. Nevertheless, we believe that relaxing this assumption is an important future direction for the analysis.
IiiB Analysis
We first apply an existing convergence result for SGD for establishing the approximation accuracy of the innerloop:
Lemma III.1
Under Assumption A.2, denote that for th outerloop of Stochastic PnPADMM , and define the following quantities for each outeriteration :
(10)  
then if the step size , for all , with , , then we have the approximation error of the proximal step bounded as:
(11) 
where the expectation is taken over the random sampling of the indices within the innerloop.
Proof. We first observe that the proximal step can be written precisely as a finitesum optimization problem of the follow form:
(12)  
which is a stronglyconvex objective and each of the element in the sum is smooth.
According to [needell2014stochastic, Theorem 2.1], if we run SGD (starting at ) with uniform random sampling and a step size , then after iterations, we have .
Now, we are able to prove the fixedpoint convergence for the inexact recursion (6), and hence for our proposed method.
Theorem III.2
Assume A.1 and A.2 with , denote positive values and such that , and , and the quantities , , , are defined as in Lemma III.1. If we choose the stepsize parameters as , , , , we have the following fixedpoint convergence for Algorithm 1:
(13) 
when .
Our main theorem suggests that for the basic form of Algorithm 1 where we choose the momentum , the outerloop stepsize , the innerloop stepsize decreasing in and the number of inneriterations increasing in , Algorithm 1 is guaranteed to converge to a fix point. However our numerical results in section IV suggest that we may set the number of innerloop and step size to be constant and use FISTAtype of momentum [2009_Beck_Fast, beck2009fast, chambolle2015convergence] for good empirical performance in practice.
IiiC Proof for Theorem iii.2
Firstly, due to assumption A.1 we have:
(14)  
hence . Denote , we have:
(15)  
Applying Lemma III.1 gives . Now according to [ryu2019plug, Theorem 2], under assumption A.1 and A.2, we can ensure that:
(16) 
where . Moreover, if and , then . Hence we have:
(17)  
If we recursively apply the same argument we will get:
(18) 
Then we use a classic criterion to show the boundedness of series where . For any finite , we have:
(19) 
where we take and , and then:
(20) 
Hence by taking , we have . Thus finishes the proof for Theorem III.2.
Iv Numerical Experiments
For our numerical experiments, we choose the Xray CT imaging as an example since it is know to favor stochastic gradient methods [tang2019practicality]. We compare our algorithm with the stateoftheart stochastic PnP method with momentum acceleration proposed by Sun et al [sun2019online], as well as the PnP FISTA algorithm [kamilov2017plug]. We use MATLAB R2018a in a machine with 1.6 GB RAM, 1.80 GHz Intel Core i78550 CPU.
We first test the compared methods on lowdose CT imaging problems, where lowenergy noisy Xray measurements with are used, which demands strong image priors are used in order to achieve goodquality reconstructions. Meanwhile we also compare these algorithms in sparseview CT imaging with , where fewer Xray measurements are taken compared to the number of pixels to be inferenced^{2}^{2}2The numerical result in this example suggests that empirically the strongconvexity is not needed for the stochastic PnPADMM to converge.. For lowdose CT example, we choose the penalized weighted leastsquares objective as the datafidelity term, which is tailored for lowdose CT [wang2006penalized]. For sparseview CT example, we choose the standard leastsquares loss as the data fidelity term. For the randomized methods we partition the data into 10 minibatches. The noisy CT observations are obtained via where the forward operator is implemented using the AIRtools package [hansen2012air]. For our algorithm, we set for all such that in each innerloop we make exactly one pass of the data, outerloop stepsize , innerloop stepsize , and the momentum parameter as suggested in [chambolle2015convergence].
We choose the BM3D [dabov2008image] with the denoiserscaling [xu2020boosting] as the denoiser:
(21) 
and maximize the reconstruction performance for each of the compared algorithms via gridsearching the parameter .
We present the numerical results of the algorithms in Figure 1 for lowdose CT inverse problem of size , and in Figure 2 for sparseview CT imaging task of size . We plot the estimation error to the groundtruth image , against the actual run time as well as the number of datapasses. We can observe that both PnPSGD and our method are much faster than PnPFISTA in terms of datapass. The PnPSGD appears to be faster than our method in terms of number of datapasses. However, in terms of actual run time, the PnPSGD is slower than our method, due to the need to compute the costly BM3D at each stochastic gradient iteration.
V Conclusion
In this work we propose a stochastic PnPADMM algorithm which is able to provide practical acceleration with stochastic gradient techniques, for efficiently solving imaging inverse problems. This is an effective approach to make the stochastic PnP schemes truly practical, by reducing the computational overhead of the modern denoisers. We provide a fixedpoint convergence analysis, and demonstrate the effectiveness of our method in numerical experiments.
Acknowledgment
This work is supported by ERC Advanced grant 694888, CSENSE.