I Introduction
Inverse problems in imaging address the reconstruction of clean images from their corrupted versions. The corruption can be a blur, loss of samples, downscale or a more complicated operator (e.g., CT and MRI), accompanied by a noise contamination. Roughly speaking, inverse problems are characterized by two main parts: the first is called the forward model, which formulates the relation between the noisy measurement and the desired signal, and the second is the prior, describing the logprobability of the destination signal.
In recent years, we have witnessed a massive advancement in a basic inverse problem referred to as image denoising [1, 2, 3, 4, 5, 6]. Indeed, recent work goes as far as speculating that the performance obtained by leading image denoising algorithms is getting very close to the possible ceiling [7, 8, 9]. This motivated researchers to seek ways to exploit this progress in order to address general inverse problems. Successful attempts, as in [10, 11, 12], suggested an exhaustive manual adaptation of existing denoising algorithms, or the priors used in them, treating specific alternative missions. This line of work has a clear limitation, as it does not offer a flexible and general scheme for incorporating various image denoising achievements for tackling other advanced image processing tasks. This led to the following natural question: is it possible to suggest a general framework that utilizes the abundance of highperformance image denoising algorithms for addressing general inverse problems? Venkatakrishnan et al. gave a positive answer to this question, proposing a framework called PlugandPlay Priors () method [13, 14, 15]. Formulating the inverse problem as an optimization task and handling it via the Alternating Direction Method of Multipliers (ADMM) scheme [16], shows that the whole problem is decomposed into a sequence of image denoising subproblems, coupled with simpler computational steps. The scheme provides a constructive answer to the desire to use denoisers within inverse problems, but it suffers from several key disadvantages: does not define a clear objective function, since the regularization used is implicit; Tuning the parameters in is extremely delicate; and since is tightly coupled with the ADMM, it has no flexibility with respect to the numerical scheme.
A novel framework named REgularization by Denoising (RED) [17] proposes an appealing alternative while overcoming all these flaws. The core idea in RED is the use of the given denoiser within an expression of regularization that generalizes a Laplacian smoothness term. The work in [17] carefully shows that the gradient of this regularization is in fact the denoising residual. This, in turn, leads to several iterative algorithms, all guaranteed to converge to the global minimum of the inverse problem’s penalty function, while using a denoising step in each iteration.
The idea of using a stateoftheart denoising algorithm for constructing an advanced prior for general inverse problems is very appealing.^{1}^{1}1blackFirstly, it enables utilizing the vast progress in image denoising for solving challenging inverse problems as explained above. Secondly, RED enables utilizing the denoiser as a blackbox. However, a fundamental problem still exists due to the high complexity of typical denoising algorithms, which are required to be activated many times in such a recovery process. Indeed, the evidence from the numerical experiments posed in [17] clearly exposes this problem, in all the three methods proposed, namely the steepest descent, the fixedpoint (FP) strategy and the ADMM scheme. Note that the FP method is a parameter free and the most efficient among the three, and yet this approach too requires the activation of the denoising algorithms dozens of times for a completion of the recovery algorithm.
black Our main contribution in this paper is to address these difficulties by applying vector extrapolation (VE) [18, 19, 20] to accelerate the FP algorithm shown in [17]. Our simulations illustrate the effectiveness of VE for this acceleration, saving more than of the overall computations involved compared with the native FP method.
The rest of this paper is organized as follows. We review RED and its FP method in Section II. Section III
recalls the Vector Extrapolation acceleration idea. Several experiments on image deblurring and superresolution, which follows the ones given in
[17], show the effectiveness of VE, and these are brought in Section IV. We conclude our paper in Section V.Ii REgularization by Denoising (RED)
This section reviews the framework of RED, which utilizes denoising algorithms as image priors [17]. We also describe its original solver based on the Fixed Point (FP) method.
Iia Inverse Problems as Optimization Tasks
From an estimation point of view, the signal
is to be recovered from its measurements using the posterior conditional probability. Using maximum a posterior probability (MAP) and the Bayes’ rule, the estimation task is formulated as:
The third equation is obtained by the fact that does not depend on . The term is known as the loglikelihood . A typical example is
(1) 
blackreferring to the case , where is any linear degradation operator and
is a white mean zero Gaussian noise with variance
. Note that the expression depends on the distribution of the noise.^{2}^{2}2White Gaussian noise is assumed throughout this paper. Now, we can write the MAP optimization problem as(2) 
where is a tradeoff parameter to balance and . refers to the prior that describes the statistical nature of . This term is typically referred to as the regularization, as it is used to stabilize the inversion by emphasizing the features of the recovered signal. In the following, we will describe how RED activates denoising algorithms for composing . Note that Equation (2) defines a wide family of inverse problems including, but not limited to, inpainting, deblurring, superresolution, [21] and more.
IiB RED and the FixedPoint Method
Define as an abstract and differentiable denoiser.^{3}^{3}3This denoising function admits a noisy image , and removes additive Gaussian noise form it, assuming a prespecified noise energy. RED suggests applying the following form as the prior:
(3) 
where denotes the transpose operator. black The term is an imageadaptive Laplacian regularizer, which favors either a small residual , or a small inner product between and the residual [17]. Plugged into Equation (2), this leads the following minimization task:
(4) 
The prior of RED is a convex function and easily differentiated if the following two conditions are met:

Local Homogeneity: For any scalar arbitrarily close to , we have .

Strong Passivity: The Jacobian is stable in the sense that its spectral radius is upper bounded by one, .
Surprisingly, the gradient of is given by
(5) 
As discussed experimentally and theoretically in [17, Section 3.2], many of the stateoftheart denoising algorithms satisfy the abovementioned two conditions, and thus the gradient of (4) is simply evaluated through (5). As a consequence, in Equation (4) is a convex function if is convex, such as in the case of (1). In such cases any gradientbased algorithm can be utilized to address Equation (4) leading to its global minimum.
Note that evaluating the gradient of calls for one denoising activation, resulting in an expensive operation as the complexity of goodperforming denoising algorithms is typically high. Because of the slow convergence speed of the steepest descent and the high complexity of ADMM, the work reported in [17] suggested using the FP method to handle the minimization task posed in Equation (4). The development of the FP method is rather simple, relying on the fact that the global minimum of (4) should satisfy the firstorder optimality condition, i.e., . For the FP method, we utilize the following iterative formula to solve this equation:
(6) 
The explicit expression of the above for is
(7) 
where
represents the identity matrix.
^{4}^{4}4black This matrix inversion is calculated in the Fourier domain for blockcirculant , or using iterative methods for the more general cases. The convergence of the FP method is guaranteed sinceAlthough the FP method is more efficient than the steepest descent and the ADMM, it still needs hundreds of iterations, which means hundreds of denoising activations, to reach the desired minimum. This results in a high complexity algorithm which we aim to address in this work. In the next section, we introduce an accelerated technique called Vector Extrapolation (VE) to substantially reduce the amount of iterations in the FP method.
Iii Proposed Method via Vector Extrapolation
We begin this section by introducing the philosophy of VE in linear and nonlinear systems and then discuss three variants of VE^{5}^{5}5black We refer the interesting readers to [20] and the references therein to explore further the VE technique.
, i.e., Minimal Polynomial Extrapolation (MPE), Reduced Rank Extrapolation (RRE) and Singular Value Decomposition Minimal Polynomial Extrapolation (SVDMPE)
[22, 20]. Efficient implementation of these three variants is also discussed. We end this section by embedding VE in the FP method for RED, offering an acceleration of this scheme. Finally, we discuss the convergence and stability properties of VE.Iiia VE in Linear and Nonlinear Systems
Consider a vector set generated via a linear process,
(8) 
where , and is the initial vector. If , a limit point exists, being the FP of (8), . We turn to describe how VE works on such linear systems [23]. Denote and define the defective vector as
(9) 
Subtracting from both sides of (8) and utilizing the fact that is the FP, we have resulting in
(10) 
We define a new extrapolated vector as a weighted average of the form
(11) 
where . Substituting (9) in (11) and using (10) and , we have
(12) 
Note that the optimal and should be chosen so as to force . This way, we attain the FP through only one extrapolation step.
More broadly speaking, given a nonzero matrix and an arbitrary nonzero vector , we can find a unique polynomial with smallest degree to yield . Such a is called the minimal polynomial of with respect to the vector . Notice that the zeros of
are the eigenvalues of
. Thus, assume that the minimal polynomial of with respect to can be represented as(13) 
resulting in . So, we have
(14) 
Multiplying both sides of (14) by results in , and thus we receive
(15) 
Subtracting these expressions gives
(16) 
This suggests that could be determined by solving the linear equations posed in (16). Once obtaining , are calculated through . Note that if is not singular yielding . Assuming is the degree of the minimal polynomial of with respect to , we can find a set of to satisfy resulting in . However, the degree of the minimal polynomial of can be as large as , which in our case is very high. Moreover, we also do not have a way to obtain this degree with an easy algorithm. Because of these two difficulties, some approximate methods are developed to extrapolate the next vector via the previous ones and we will discuss them in Subsection IIIB.
Turning to the nonlinear case, denote as the FP function to evaluate the next vector,
(17) 
where is an dimensional vectorvalued function, . We say is a FP of if . Expanding in its Taylor series yields
where is the Jacobian matrix of . Recalling , we have
Assuming the sequence converges to (if ), it follows that will be close enough to for all large , and hence
Then, we rewrite this in the form
For large , the vectors behave as in the linear system of the form through
where , . This implies that the nonlinear system yields the same formula as the linear one and motivates us to extrapolate the next vector by the previous ones as in linear systems. Indeed, such an extension has been shown to be successful in various areas of science and engineering, e.g., computational fluid dynamics, semiconductor research, tomography and geometrical image processing [19, 24, 20].
IiiB Derivations of MPE, RRE and SVDMPE
We turn to discuss how to utilize an approximate way to obtain the next vector by extrapolating the previous ones. Due to the fact that the degree of the minimal polynomial can be as large as and we cannot obtain it, an arbitrary positive number is set as the degree, being much smaller than the true one. With such a replacement, the linear equations in (16) become inconsistent and there does not exist a solution for in the ordinary sense. Alternatively, we solve instead
(18) 
where and . Then evaluating through results in the next vector as a new approximation. This method is known as Minimal Polynomial Extrapolation (MPE) [25].
The detailed steps for obtaining the next vector through MPE are shown in Algorithm 1. To solve the constrained problem in (18
), we suggest utilizing QR decomposition with the modified GramSchmidt (MGS)
[25, 26]. The MGS procedure for the matrix is shown in Algorithm 2.Now, let us discuss the other two variants of VE, i.e., Reduced Rank Extrapolation (RRE) [25] and SVDMPE [22]. The main difference among RRE, MPE and SVDMPE is at Step 2 in Algorithm 1 regarding the evaluation of . In RRE and SVDMPE, we utilize the following methods to obtain :

Solving through forward and backward substitution, we obtain through . Actually, such a formulation of is the solution of:
(19) 
Computing the SVD decomposition of , we have where and represent the last column and the th element of matrix , respectively.
Here are two remarks regarding RRE, MPE and SVDMPE:

Observing the derivations of MPE, SVDMPE and RRE, we notice that RRE’s solution must exist unconditionally, while MPE and SVDMPE may not exist because the sum of and in MPE and SVDMPE may become zero. Thus RRE may be more robust in practice [24]. However, MPE and RRE are related, as revealed in [27]. Specifically, if MPE does not exist, we have . Otherwise, the following holds
where , and are positive scalars depending only on , and , respectively. Furthermore, the performance of MPE and RRE is similar – both of the methods either perform well or work poorly [20].

Observe that we only need to store vectors in memory at all steps in Algorithm 2. Formulating the matrix , we overwrite the vector with when the latter is computed and only is always in the memory. Next, is overwritten by , in computing the matrix . Thus, we do not need to save the vectors , which implies that no additional memory is required in running Algorithm 2.
IiiC Embedding VE in the Baseline Algorithm
We introduce VE in its cycling formulation for practical usage. One cycling means we activate the baseline algorithm to produce and then utilize VE once to evaluate the new vector as a novel initial point. Naturally, we repeat such a cycling many times. The steps of utilizing VE in its cycling mode are shown in Algorithm 3. Few comments are in order:

In practice, we utilize VE in its cycling formulation. Specially, the iterative form shown in Algorithm 3 is named as full cycling [20]. To save the complexity of computing , one may reuse the previous , a method known as cycling with frozen . Parallel VE can be also developed if more machines are available. Explaining details of the last two strategies is out of the scope of this paper. We refer the reader to [20] for more information.

Numerical experience also indicates that cycling with even moderately large will avoid stalling from happening [28]. Moreover, we also recommend setting when the problem becomes challenging to solve.

In our case, the termination criterion in Algorithm 3 can be the number of total iterations (the number of calling of the baseline algorithm) or the difference between consecutive two vectors. Furthermore, we also recommend giving additional iterations to activate the baseline algorithm after terminating the VE, which can stabilize the accelerated algorithm in practice.
IiiD Convergence and Stability Properties
We mention existing results regarding the convergence and stability properties of VE for understanding this technique better. A rich literature has examined the convergence and stability properties of RRE, MPE and SVDMPE in linear systems [29, 30]. Assuming the matrix is diagonalizable, then in the th iteration should have the form where
are some or all of the eigenvalues and corresponding eigenvectors of
, with distinct nonzero eigenvalues. By ordering as , the following asymptotic performance holds for all of the three variants of VE when :(20) 
This implies that the sequence converges to faster than the original sequence .
As shown in (20), for a large , (8) reduces the contributions of the smaller to the error , while VE eliminates the contributions of the largest . This indicates that is smaller than each of the errors , when is large enough. We mention another observation that an increasing generally results in a faster convergence of VE. However, a large has to increase the storage requirements and also requires a much higher computational cost. Numerical experiments indicate that a moderate can already works well in practice.
If the following condition is held, we say VE is stable:
(21) 
Here, we denote by to show their dependence on and . If (21) holds true, the error in will not magnify severely. As shown in [29, 30, 31], MPE and RRE obey such a stability property.
For nonlinear systems, the analysis of convergence and stability becomes extremely challenging. One of the main results is the quadratic convergence theorem [32, 23, 33]. This theorem is built on one special assumption that is set to be the degree of the minimal polynomial of . The proof of the quadratic convergence was shown in [32]. In a following work, Smith, Ford and Sidi noticed that there exists a gap in the previous proof [23]. Jbilou et al. suggested two more conditions in order to close the gap [33]:

The matrix is nonsingular

satisfies the following Lipschitz condition:
Surprisingly, these two conditions are met by the RED scheme. The first condition is satisfied by the fact
The second one is also true, due to the assumption in RED that the denoiser is differentiable. So we claim that it is possible for VE to solve RED with quadratic convergence rate.
Although VE can lead to a quadratic convergence rate, trying to achieve such a rate may not be realistic because can be as large as . However, we may obtain a linear but fast convergence in practice with even moderate values of and , which is also demonstrated in the following numerical experiments.
Iv Experimental Results
We follow the same experiments of image deblurring and superresolution as presented in [17] to investigate the performance of VE in acceleration. The trainable nonlinear reaction diffusion (TNRD) method [6] is chosen as the denoising engine. Mainly, we choose the FP method as our baseline algorithm. For a fair comparison, the same parameters suggested in [17] for different image processing tasks are set in our experiments. black In [17], the authors compared RED with other popular algorithms in image deblurring and superresolution tasks, showing its superiority. As the main purpose in this paper is to present the acceleration of our method for solving RED, we omit the comparisons with other popular algorithms. In the following, we mainly show the acceleration of applying MPE with FP for solving RED first and then discuss the choice of parameters in VE, i.e., and . black In addition, we compare our method with three other methods, steepest descent (SD), Nesterov’s acceleration [34] and Limitedmemory BFGS (LBFGS) [35]. Note that we need to determine a proper stepsize for the above methods [35]. However, evaluating the objective value or gradient in RED is expensive implying that any linesearch method becomes prohibitive. Note that, in contrast, in our framework as described in Algorithm 3 does not suffer from such a problem. In the following, we manually choose a fixed stepsize for getting a good convergence behavior. Finally, we compare the difference among RRE, MPE and SVDMPE. All of the experiments are conducted on a workstation with Intel(R) Xeon(R) CPU E52699 @2.20GHz.
Iva Image deblurring
black In this experiment, we degrade the test images by convolving with two different point spread functions (PSFs), i.e., uniform blur and a Gaussian blur with a standard derivation of . In both of these cases, we add an additive Gaussian noise with to the blurred images. The parameters and in Algorithm 3 are set to and for the image deblurring task. black Additionally, we apply VE to the case where the baseline algorithm is SD, called SDMPE, with the parameters and are chosen as and . blackThe value of the cost function and peak signal to noise ratio (PSNR) versus iteration or CPU time are given in Fig.s 2 and 3^{6}^{6}6black The goal of this paper is to investigate the performance of solving RED with VE rather than the restoration results. Therefore, we present the recovered PSNR versus iteration or running time. One can utilize some noreference quality metrics like NFERM [36] and ARISMc [37] to further examine the restoration results.. These correspond to both a uniform and a Gaussian blur kernels, all tested on the “starfish” image. black Clearly, we observe that SD is the slowest algorithm. Surprisingly, SDMPE and Nesterov’s method yield almost the same convergence speed, despite their totally different scheme. Moreover, We note that FP is faster than LBFGS, Nesterov’s method and SDMPE. Undoubtedly, FPMPE is the fastest one, both in terms of iterations and CPU time, which indicates the effectiveness of MPE’s acceleration. black To provide a visual effect, we show the change in reconstructed quality of different algorithms in Fig. 1. Clearly, the third column of FPMPE achieves the best reconstruction faster, while other methods need more iterations to obtain a comparable result.






Additional nine test images suggested in [17] are also included in our experiments, in order to investigate the performance of VE further. blackIn this experiment we focus on the comparison between FPMPE and FP for the additional images. We run the native FP method iterations first and black denote the final image by . Clearly, the corresponding costvalue is . We activate Algorithm 3 with the same initial value as used in the FP method to examine how many iterations are needed to attain the same or lower objective value than . The final number of iterations with different images are given in Table I. Clearly, an acceleration is observed in all the test images in the image deblurring task.
Image  Butterfly  Boats  C. Man  House  Parrot  Lena  Barbara  Starfish  Peppers  Leaves 

Deblurring: Uniform kernel,  
RED: FPTNRD  200  200  200  200  200  200  200  200  200  200 
RED: FPMPETNRD  60  55  55  80  50  50  55  50  55  55 
Deblurring: Gaussian kernel,  
RED: FPTNRD  200  200  200  200  200  200  200  200  200  200 
RED: FPMPETNRD  70  65  55  65  55  80  45  55  95  80 
IvB Image superresolution
black We generate a low resolution image by blurring the ground truth one with a Gaussian kernel with standard derivation and then downsample by a factor of . Afterwards, an additive Gaussian noise with is added to the resulting image. The same parameters and used in the deblurring task for FPMPE are adopted here. For SDMPE, the parameters and are set to and , respectively. We choose “Plants” as our test image because it needs more iterations for FPMPE to converge. blackAs observed from Fig. 4, while LBFGS and the Nesterov’s method are faster than the FP method, our acceleration method (FPMPE) is quite competitive with both. Furthermore, we investigate all of the test images as shown in [17] to see how many iterations are needed for MPE to achieve the same or lower cost compared with the FP method. The results are shown in Table II. As can be seen, MPE works better than the FP method indicating an effective acceleration for solving RED.
SuperResolution, scaling = ,  

Image  Butterfly  Flower  Girl  Parth.  Parrot  Raccoon  Bike  Hat  Plants 
RED: FPTNRD  200  200  200  200  200  200  200  200  200 
RED: FPMPETNRD  60  65  50  70  55  60  60  50  70 
IvC The Choice of the Parameters and the Difference Among RRE, MPE and SVDMPE
Conclude by discussing the robustness to the choice of parameters and for the MPE algorithm. To this end, the single image superresolution task is chosen as our study. Furthermore, we choose to demonstrate this robustness on the “Plants” image since it required the largest number of iterations in the MPE recovery process. As seen from Fig.s 5(a)  5(c), MPE always converges faster than the regular FP method with different and . Moreover, we also observe that a lower objective value is attained through MPE. Notice that MPE has some oscillations because it is not a monotonically accelerated technique. However, we still see a lower cost is achieved if additional iterations are given.
In part (d) of Fig. 5, an optimal pair of and is chosen for MPE, RRE and SVDMPE for the single image superresolution task with the “Plants” image.^{7}^{7}7The optimal and are obtained by searching in the range with , seeking the fastest convergence for these three methods. We see that all three methods yield an acceleration and a lower cost, demonstrating the effectiveness of the various variants of VE. Moreover, we see that SVDMPE converges faster at the beginning, but MPE yields a lowest eventual cost.
V Conclusion
The work reported in [17] introduced RED – a flexible framework for using arbitrary image denoising algorithms as priors for general inverse problems. This scheme amounts to iterative algorithms in which the denoiser is called repeatedly. While appealing and quite practical, there is one major weakness to the RED scheme – the complexity of denoising algorithms is typically high which implies that the use of RED is likely to be costly in runtime. This work aims at deploying RED efficiently, alleviating the above described shortcoming. An accelerated technique is proposed in this paper, based on the Vector Extrapolation (VE) methodology. The proposed algorithms are demonstrated to substantially reduce the number of overall iterations required for the overall recovery process. We also observe that the choice of the parameters in the VE scheme is robust.
References
 [1] A. Buades, B. Coll, and J.M. Morel, “A nonlocal algorithm for image denoising,” in Computer Vision and Pattern Recognition, CVPR, IEEE Computer Society Conference on, vol. 2, pp. 60–65, 2005.
 [2] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736–3745, 2006.
 [3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3d transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
 [4] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 479–486, IEEE, 2011.
 [5] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse representation for image restoration,” IEEE Transactions on Image Processing, vol. 22, no. 4, pp. 1620–1630, 2013.
 [6] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–1272, 2017.
 [7] P. Chatterjee and P. Milanfar, “Is denoising dead?,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 895–911, 2010.
 [8] P. Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 106–128, 2013.
 [9] A. Levin and B. Nadler, “Natural image denoising: Optimality and inherent bounds,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 2833–2840, IEEE, 2011.
 [10] M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing the nonlocalmeans to superresolution reconstruction,” IEEE Transactions on Image Processing, vol. 18, no. 1, pp. 36–51, 2009.
 [11] A. Danielyan, V. Katkovnik, and K. Egiazarian, “Bm3d frames and variational image deblurring,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1715–1728, 2012.
 [12] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “Optimal recovery from compressive measurements via denoisingbased approximate message passing,” in Sampling Theory and Applications (SampTA), 2015 International Conference on, pp. 508–512, IEEE, 2015.
 [13] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plugandplay priors for model based reconstruction,” in Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, pp. 945–948, IEEE, 2013.

[14]
S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plugandplay priors for bright field electron tomography and sparse interpolation,”
IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016.  [15] S. H. Chan, X. Wang, and O. A. Elgendy, “Plugandplay admm for image restoration: Fixedpoint convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017.

[16]
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al.,
“Distributed optimization and statistical learning via the alternating
direction method of multipliers,”
Foundations and Trends® in Machine Learning
, vol. 3, no. 1, pp. 1–122, 2011.  [17] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017.
 [18] S. Cabay and L. Jackson, “A polynomial extrapolation method for finding limits and antilimits of vector sequences,” SIAM Journal on Numerical Analysis, vol. 13, no. 5, pp. 734–752, 1976.
 [19] N. Rajeevan, K. Rajgopal, and G. Krishna, “Vectorextrapolated fast maximum likelihood estimation algorithms for emission tomography,” IEEE Transactions on Medical Imaging, vol. 11, no. 1, pp. 9–20, 1992.
 [20] A. Sidi, Vector Extrapolation Methods with Applications, vol. 17. SIAM, 2017.
 [21] A. Kirsch, An introduction to the mathematical theory of inverse problems, vol. 120. Springer Science & Business Media, 2011.
 [22] A. Sidi, “Svdmpe: An svdbased vector extrapolation method of polynomial type,” arXiv preprint arXiv:1505.00674, 2015.
 [23] D. A. Smith, W. F. Ford, and A. Sidi, “Extrapolation methods for vector sequences,” SIAM Review, vol. 29, no. 2, pp. 199–233, 1987.
 [24] G. Rosman, L. Dascal, A. Sidi, and R. Kimmel, “Efficient beltrami image filtering via vector extrapolation methods,” SIAM Journal on Imaging Sciences, vol. 2, no. 3, pp. 858–878, 2009.
 [25] A. Sidi, “Efficient implementation of minimal polynomial and reduced rank extrapolation methods,” Journal of Computational and Applied Mathematics, vol. 36, no. 3, pp. 305–337, 1991.
 [26] G. H. Golub and C. F. Van Loan, Matrix Computations, vol. 3. JHU Press, 2012.
 [27] A. Sidi, “Minimal polynomial and reduced rank extrapolation methods are related,” Advances in Computational Mathematics, vol. 43, no. 1, pp. 151–170, 2017.
 [28] A. Sidi and Y. Shapira, “Upper bounds for convergence rates of acceleration methods with initial iterations,” Numerical Algorithms, vol. 18, no. 2, pp. 113–132, 1998.
 [29] A. Sidi, W. F. Ford, and D. A. Smith, “Acceleration of convergence of vector sequences,” SIAM Journal on Numerical Analysis, vol. 23, no. 1, pp. 178–196, 1986.
 [30] A. Sidi and J. Bridger, “Convergence and stability analyses for some vector extrapolation methods in the presence of defective iteration matrices,” Journal of Computational and Applied Mathematics, vol. 22, no. 1, pp. 35–61, 1988.
 [31] A. Sidi, “Convergence and stability properties of minimal polynomial and reduced rank extrapolation algorithms,” SIAM Journal on Numerical Analysis, vol. 23, no. 1, pp. 197–209, 1986.
 [32] S. Skelboe, “Computation of the periodic steadystate response of nonlinear networks by extrapolation methods,” IEEE Transactions on Circuits and Systems, vol. 27, no. 3, pp. 161–175, 1980.
 [33] K. Jbilou and H. Sadok, “Some results about vector extrapolation methods and related fixedpoint iterations,” Journal of Computational and Applied Mathematics, vol. 36, no. 3, pp. 385–398, 1991.
 [34] Y. Nesterov, Lectures on Convex Optimization. Springer, 2018.
 [35] J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 2006.
 [36] K. Gu, G. Zhai, X. Yang, and W. Zhang, “Using free energy principle for blind image quality assessment,” IEEE Transactions on Multimedia, vol. 17, no. 1, pp. 50–63, 2015.
 [37] K. Gu, G. Zhai, W. Lin, X. Yang, and W. Zhang, “Noreference image sharpness assessment in autoregressive parameter space,” IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3218–3231, 2015.