Acceleration of RED via Vector Extrapolation

05/06/2018 ∙ by Tao Hong, et al. ∙ Stanford University 0

Models play an important role in inverse problems, serving as the prior for representing the original signal to be recovered. REgularization by Denoising (RED) is a recently introduced general framework for constructing such priors using state-of-the-art denoising algorithms. Using RED, solving inverse problems is shown to amount to an iterated denoising process. However, as the complexity of denoising algorithms is generally high, this might lead to an overall slow algorithm. In this paper, we suggest an accelerated technique based on vector extrapolation (VE) to speed-up existing RED solvers. Numerical experiments validate the obtained gain by VE, leading to nearly 70% savings in computations compared with the original solvers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Inverse problems in imaging address the reconstruction of clean images from their corrupted versions. The corruption can be a blur, loss of samples, downscale or a more complicated operator (e.g., CT and MRI), accompanied by a noise contamination. Roughly speaking, inverse problems are characterized by two main parts: the first is called the forward model, which formulates the relation between the noisy measurement and the desired signal, and the second is the prior, describing the log-probability of the destination signal.

In recent years, we have witnessed a massive advancement in a basic inverse problem referred to as image denoising [1, 2, 3, 4, 5, 6]. Indeed, recent work goes as far as speculating that the performance obtained by leading image denoising algorithms is getting very close to the possible ceiling [7, 8, 9]. This motivated researchers to seek ways to exploit this progress in order to address general inverse problems. Successful attempts, as in [10, 11, 12], suggested an exhaustive manual adaptation of existing denoising algorithms, or the priors used in them, treating specific alternative missions. This line of work has a clear limitation, as it does not offer a flexible and general scheme for incorporating various image denoising achievements for tackling other advanced image processing tasks. This led to the following natural question: is it possible to suggest a general framework that utilizes the abundance of high-performance image denoising algorithms for addressing general inverse problems? Venkatakrishnan et al. gave a positive answer to this question, proposing a framework called Plug-and-Play Priors () method [13, 14, 15]. Formulating the inverse problem as an optimization task and handling it via the Alternating Direction Method of Multipliers (ADMM) scheme [16], shows that the whole problem is decomposed into a sequence of image denoising sub-problems, coupled with simpler computational steps. The scheme provides a constructive answer to the desire to use denoisers within inverse problems, but it suffers from several key disadvantages: does not define a clear objective function, since the regularization used is implicit; Tuning the parameters in is extremely delicate; and since is tightly coupled with the ADMM, it has no flexibility with respect to the numerical scheme.

A novel framework named REgularization by Denoising (RED) [17] proposes an appealing alternative while overcoming all these flaws. The core idea in RED is the use of the given denoiser within an expression of regularization that generalizes a Laplacian smoothness term. The work in [17] carefully shows that the gradient of this regularization is in fact the denoising residual. This, in turn, leads to several iterative algorithms, all guaranteed to converge to the global minimum of the inverse problem’s penalty function, while using a denoising step in each iteration.

The idea of using a state-of-the-art denoising algorithm for constructing an advanced prior for general inverse problems is very appealing.111blackFirstly, it enables utilizing the vast progress in image denoising for solving challenging inverse problems as explained above. Secondly, RED enables utilizing the denoiser as a black-box. However, a fundamental problem still exists due to the high complexity of typical denoising algorithms, which are required to be activated many times in such a recovery process. Indeed, the evidence from the numerical experiments posed in [17] clearly exposes this problem, in all the three methods proposed, namely the steepest descent, the fixed-point (FP) strategy and the ADMM scheme. Note that the FP method is a parameter free and the most efficient among the three, and yet this approach too requires the activation of the denoising algorithms dozens of times for a completion of the recovery algorithm.

black Our main contribution in this paper is to address these difficulties by applying vector extrapolation (VE) [18, 19, 20] to accelerate the FP algorithm shown in [17]. Our simulations illustrate the effectiveness of VE for this acceleration, saving more than of the overall computations involved compared with the native FP method.

The rest of this paper is organized as follows. We review RED and its FP method in Section II. Section III

recalls the Vector Extrapolation acceleration idea. Several experiments on image deblurring and super-resolution, which follows the ones given in

[17], show the effectiveness of VE, and these are brought in Section IV. We conclude our paper in Section V.

Ii REgularization by Denoising (RED)

This section reviews the framework of RED, which utilizes denoising algorithms as image priors [17]. We also describe its original solver based on the Fixed Point (FP) method.

Ii-a Inverse Problems as Optimization Tasks

From an estimation point of view, the signal

is to be recovered from its measurements using the posterior conditional probability

. Using maximum a posterior probability (MAP) and the Bayes’ rule, the estimation task is formulated as:

The third equation is obtained by the fact that does not depend on . The term is known as the log-likelihood . A typical example is

(1)

blackreferring to the case , where is any linear degradation operator and

is a white mean zero Gaussian noise with variance

. Note that the expression depends on the distribution of the noise.222White Gaussian noise is assumed throughout this paper. Now, we can write the MAP optimization problem as

(2)

where is a trade-off parameter to balance and . refers to the prior that describes the statistical nature of . This term is typically referred to as the regularization, as it is used to stabilize the inversion by emphasizing the features of the recovered signal. In the following, we will describe how RED activates denoising algorithms for composing . Note that Equation (2) defines a wide family of inverse problems including, but not limited to, inpainting, deblurring, super-resolution, [21] and more.

Ii-B RED and the Fixed-Point Method

Define as an abstract and differentiable denoiser.333This denoising function admits a noisy image , and removes additive Gaussian noise form it, assuming a prespecified noise energy. RED suggests applying the following form as the prior:

(3)

where denotes the transpose operator. black The term is an image-adaptive Laplacian regularizer, which favors either a small residual , or a small inner product between and the residual [17]. Plugged into Equation (2), this leads the following minimization task:

(4)

The prior of RED is a convex function and easily differentiated if the following two conditions are met:

  • Local Homogeneity: For any scalar arbitrarily close to , we have .

  • Strong Passivity: The Jacobian is stable in the sense that its spectral radius is upper bounded by one, .

Surprisingly, the gradient of is given by

(5)

As discussed experimentally and theoretically in [17, Section 3.2], many of the state-of-the-art denoising algorithms satisfy the above-mentioned two conditions, and thus the gradient of (4) is simply evaluated through (5). As a consequence, in Equation (4) is a convex function if is convex, such as in the case of (1). In such cases any gradient-based algorithm can be utilized to address Equation (4) leading to its global minimum.

Note that evaluating the gradient of calls for one denoising activation, resulting in an expensive operation as the complexity of good-performing denoising algorithms is typically high. Because of the slow convergence speed of the steepest descent and the high complexity of ADMM, the work reported in [17] suggested using the FP method to handle the minimization task posed in Equation (4). The development of the FP method is rather simple, relying on the fact that the global minimum of (4) should satisfy the first-order optimality condition, i.e., . For the FP method, we utilize the following iterative formula to solve this equation:

(6)

The explicit expression of the above for is

(7)

where

represents the identity matrix.

444black This matrix inversion is calculated in the Fourier domain for block-circulant , or using iterative methods for the more general cases. The convergence of the FP method is guaranteed since

Although the FP method is more efficient than the steepest descent and the ADMM, it still needs hundreds of iterations, which means hundreds of denoising activations, to reach the desired minimum. This results in a high complexity algorithm which we aim to address in this work. In the next section, we introduce an accelerated technique called Vector Extrapolation (VE) to substantially reduce the amount of iterations in the FP method.

Iii Proposed Method via Vector Extrapolation

We begin this section by introducing the philosophy of VE in linear and nonlinear systems and then discuss three variants of VE555black We refer the interesting readers to [20] and the references therein to explore further the VE technique.

, i.e., Minimal Polynomial Extrapolation (MPE), Reduced Rank Extrapolation (RRE) and Singular Value Decomposition Minimal Polynomial Extrapolation (SVD-MPE)

[22, 20]. Efficient implementation of these three variants is also discussed. We end this section by embedding VE in the FP method for RED, offering an acceleration of this scheme. Finally, we discuss the convergence and stability properties of VE.

Iii-a VE in Linear and Nonlinear Systems

Consider a vector set generated via a linear process,

(8)

where , and is the initial vector. If , a limit point exists, being the FP of (8), . We turn to describe how VE works on such linear systems [23]. Denote and define the defective vector as

(9)

Subtracting from both sides of (8) and utilizing the fact that is the FP, we have resulting in

(10)

We define a new extrapolated vector as a weighted average of the form

(11)

where . Substituting (9) in (11) and using (10) and , we have

(12)

Note that the optimal and should be chosen so as to force . This way, we attain the FP through only one extrapolation step.

More broadly speaking, given a nonzero matrix and an arbitrary nonzero vector , we can find a unique polynomial with smallest degree to yield . Such a is called the minimal polynomial of with respect to the vector . Notice that the zeros of

are the eigenvalues of

. Thus, assume that the minimal polynomial of with respect to can be represented as

(13)

resulting in . So, we have

(14)

Multiplying both sides of (14) by results in , and thus we receive

(15)

Subtracting these expressions gives

(16)

This suggests that could be determined by solving the linear equations posed in (16). Once obtaining , are calculated through . Note that if is not singular yielding . Assuming is the degree of the minimal polynomial of with respect to , we can find a set of to satisfy resulting in . However, the degree of the minimal polynomial of can be as large as , which in our case is very high. Moreover, we also do not have a way to obtain this degree with an easy algorithm. Because of these two difficulties, some approximate methods are developed to extrapolate the next vector via the previous ones and we will discuss them in Subsection III-B.

Turning to the nonlinear case, denote as the FP function to evaluate the next vector,

(17)

where is an -dimensional vector-valued function, . We say is a FP of if . Expanding in its Taylor series yields

where is the Jacobian matrix of . Recalling , we have

Assuming the sequence converges to (if ), it follows that will be close enough to for all large , and hence

Then, we rewrite this in the form

For large , the vectors behave as in the linear system of the form through

where , . This implies that the nonlinear system yields the same formula as the linear one and motivates us to extrapolate the next vector by the previous ones as in linear systems. Indeed, such an extension has been shown to be successful in various areas of science and engineering, e.g., computational fluid dynamics, semiconductor research, tomography and geometrical image processing [19, 24, 20].

Iii-B Derivations of MPE, RRE and SVD-MPE

We turn to discuss how to utilize an approximate way to obtain the next vector by extrapolating the previous ones. Due to the fact that the degree of the minimal polynomial can be as large as and we cannot obtain it, an arbitrary positive number is set as the degree, being much smaller than the true one. With such a replacement, the linear equations in (16) become inconsistent and there does not exist a solution for in the ordinary sense. Alternatively, we solve instead

(18)

where and . Then evaluating through results in the next vector as a new approximation. This method is known as Minimal Polynomial Extrapolation (MPE) [25].

The detailed steps for obtaining the next vector through MPE are shown in Algorithm 1. To solve the constrained problem in (18

), we suggest utilizing QR decomposition with the modified Gram-Schmidt (MGS)

[25, 26]. The MGS procedure for the matrix is shown in Algorithm 2.

0:   A sequence of vectors is produced by the baseline algorithm (FP in our case).
0:   A new vector .
1:  Construct the matrix
and then compute its QR factorization via Algorithm 2, .
2:  Denote as the th column of without the last row and solve the following upper triangular system
where is the previous columns of without the last row. Finally, evaluate through .
3:  Compute through
4:  Compute . Then we attain as the new initial vector where represents the previous columns of .
Algorithm 1 Minimal Polynomial Extrapolation (MPE)
0:   and ( denotes the th element of and and represent the th column of and , respectively.).
1:  Compute and .
2:  for  do
3:     Set
4:     for  do
5:         and
6:     end for
7:     Compute and
8:  end for
Algorithm 2 Modified Gram-Schmidt (MGS)

Now, let us discuss the other two variants of VE, i.e., Reduced Rank Extrapolation (RRE) [25] and SVD-MPE [22]. The main difference among RRE, MPE and SVD-MPE is at Step 2 in Algorithm 1 regarding the evaluation of . In RRE and SVD-MPE, we utilize the following methods to obtain :

  • Solving through forward and backward substitution, we obtain through . Actually, such a formulation of is the solution of:

    (19)
  • Computing the SVD decomposition of , we have where and represent the last column and the th element of matrix , respectively.

Here are two remarks regarding RRE, MPE and SVD-MPE:

  • Observing the derivations of MPE, SVD-MPE and RRE, we notice that RRE’s solution must exist unconditionally, while MPE and SVD-MPE may not exist because the sum of and in MPE and SVD-MPE may become zero. Thus RRE may be more robust in practice [24]. However, MPE and RRE are related, as revealed in [27]. Specifically, if MPE does not exist, we have . Otherwise, the following holds

    where , and are positive scalars depending only on , and , respectively. Furthermore, the performance of MPE and RRE is similar – both of the methods either perform well or work poorly [20].

  • Observe that we only need to store vectors in memory at all steps in Algorithm 2. Formulating the matrix , we overwrite the vector with when the latter is computed and only is always in the memory. Next, is overwritten by , in computing the matrix . Thus, we do not need to save the vectors , which implies that no additional memory is required in running Algorithm 2.

Iii-C Embedding VE in the Baseline Algorithm

We introduce VE in its cycling formulation for practical usage. One cycling means we activate the baseline algorithm to produce and then utilize VE once to evaluate the new vector as a novel initial point. Naturally, we repeat such a cycling many times. The steps of utilizing VE in its cycling mode are shown in Algorithm 3. Few comments are in order:

  • In practice, we utilize VE in its cycling formulation. Specially, the iterative form shown in Algorithm 3 is named as full cycling [20]. To save the complexity of computing , one may reuse the previous , a method known as cycling with frozen . Parallel VE can be also developed if more machines are available. Explaining details of the last two strategies is out of the scope of this paper. We refer the reader to [20] for more information.

  • Numerical experience also indicates that cycling with even moderately large will avoid stalling from happening [28]. Moreover, we also recommend setting when the problem becomes challenging to solve.

  • In our case, the termination criterion in Algorithm 3 can be the number of total iterations (the number of calling of the baseline algorithm) or the difference between consecutive two vectors. Furthermore, we also recommend giving additional iterations to activate the baseline algorithm after terminating the VE, which can stabilize the accelerated algorithm in practice.

0:   Choose nonnegative integers and and an initial vector . The baseline algorithm is the FP method, as given in (7).
0:   Final Solution .
1:  while 1 do
2:     Obtain the series of through the baseline algorithm where , and save for to formulate .
3:     Call Algorithm 1 to obtain .
4:     If the termination of the algorithm is satisfied, set and break, otherwise, set as the new initial point and go to Step 2.
5:  end while
Algorithm 3 Baseline Algorithm Vector Extrapolation

Iii-D Convergence and Stability Properties

We mention existing results regarding the convergence and stability properties of VE for understanding this technique better. A rich literature has examined the convergence and stability properties of RRE, MPE and SVD-MPE in linear systems [29, 30]. Assuming the matrix is diagonalizable, then in the th iteration should have the form where

are some or all of the eigenvalues and corresponding eigenvectors of

, with distinct nonzero eigenvalues. By ordering as , the following asymptotic performance holds for all of the three variants of VE when :

(20)

This implies that the sequence converges to faster than the original sequence .

As shown in (20), for a large , (8) reduces the contributions of the smaller to the error , while VE eliminates the contributions of the largest . This indicates that is smaller than each of the errors , when is large enough. We mention another observation that an increasing generally results in a faster convergence of VE. However, a large has to increase the storage requirements and also requires a much higher computational cost. Numerical experiments indicate that a moderate can already works well in practice.

If the following condition is held, we say VE is stable:

(21)

Here, we denote by to show their dependence on and . If (21) holds true, the error in will not magnify severely. As shown in [29, 30, 31], MPE and RRE obey such a stability property.

For nonlinear systems, the analysis of convergence and stability becomes extremely challenging. One of the main results is the quadratic convergence theorem [32, 23, 33]. This theorem is built on one special assumption that is set to be the degree of the minimal polynomial of . The proof of the quadratic convergence was shown in [32]. In a following work, Smith, Ford and Sidi noticed that there exists a gap in the previous proof [23]. Jbilou et al. suggested two more conditions in order to close the gap [33]:

  • The matrix is nonsingular

  • satisfies the following Lipschitz condition:

Surprisingly, these two conditions are met by the RED scheme. The first condition is satisfied by the fact

The second one is also true, due to the assumption in RED that the denoiser is differentiable. So we claim that it is possible for VE to solve RED with quadratic convergence rate.

Although VE can lead to a quadratic convergence rate, trying to achieve such a rate may not be realistic because can be as large as . However, we may obtain a linear but fast convergence in practice with even moderate values of and , which is also demonstrated in the following numerical experiments.

Iv Experimental Results

We follow the same experiments of image deblurring and super-resolution as presented in [17] to investigate the performance of VE in acceleration. The trainable nonlinear reaction diffusion (TNRD) method [6] is chosen as the denoising engine. Mainly, we choose the FP method as our baseline algorithm. For a fair comparison, the same parameters suggested in [17] for different image processing tasks are set in our experiments. black In [17], the authors compared RED with other popular algorithms in image deblurring and super-resolution tasks, showing its superiority. As the main purpose in this paper is to present the acceleration of our method for solving RED, we omit the comparisons with other popular algorithms. In the following, we mainly show the acceleration of applying MPE with FP for solving RED first and then discuss the choice of parameters in VE, i.e., and . black In addition, we compare our method with three other methods, steepest descent (SD), Nesterov’s acceleration [34] and Limited-memory BFGS (L-BFGS) [35]. Note that we need to determine a proper step-size for the above methods [35]. However, evaluating the objective value or gradient in RED is expensive implying that any line-search method becomes prohibitive. Note that, in contrast, in our framework as described in Algorithm 3 does not suffer from such a problem. In the following, we manually choose a fixed step-size for getting a good convergence behavior. Finally, we compare the difference among RRE, MPE and SVD-MPE. All of the experiments are conducted on a workstation with Intel(R) Xeon(R) CPU E5-2699 @2.20GHz.

Iv-a Image deblurring

black In this experiment, we degrade the test images by convolving with two different point spread functions (PSFs), i.e., uniform blur and a Gaussian blur with a standard derivation of . In both of these cases, we add an additive Gaussian noise with to the blurred images. The parameters and in Algorithm 3 are set to and for the image deblurring task. black Additionally, we apply VE to the case where the baseline algorithm is SD, called SD-MPE, with the parameters and are chosen as and . blackThe value of the cost function and peak signal to noise ratio (PSNR) versus iteration or CPU time are given in Fig.s 2 and 3666black The goal of this paper is to investigate the performance of solving RED with VE rather than the restoration results. Therefore, we present the recovered PSNR versus iteration or running time. One can utilize some no-reference quality metrics like NFERM [36] and ARISMc [37] to further examine the restoration results.. These correspond to both a uniform and a Gaussian blur kernels, all tested on the “starfish” image. black Clearly, we observe that SD is the slowest algorithm. Surprisingly, SD-MPE and Nesterov’s method yield almost the same convergence speed, despite their totally different scheme. Moreover, We note that FP is faster than L-BFGS, Nesterov’s method and SD-MPE. Undoubtedly, FP-MPE is the fastest one, both in terms of iterations and CPU time, which indicates the effectiveness of MPE’s acceleration. black To provide a visual effect, we show the change in reconstructed quality of different algorithms in Fig. 1. Clearly, the third column of FP-MPE achieves the best reconstruction faster, while other methods need more iterations to obtain a comparable result.

(a) SD: .
(b) SD-MPE: .
(c) Nesterov: .
(d) LBFGS: .
(e) FP: .
(f) FP-MPE: .
Fig. 1: Reconstruction results of different algorithms in various iterations for the uniform kernel. From left to right: Blurred one Ground truth.
(a) Cost value versus iteration.
(b) Cost value versus CPU time.
(c) PSNR versus iteration.
(d) PSNR versus CPU time.
Fig. 2: Image Deblurring - Uniform Kernel, “Starfish” Image.
(a) Cost value versus iteration.
(b) Cost value versus CPU time.
(c) PSNR versus iteration
(d) PSNR versus iteration
Fig. 3: Image Deblurring - Gaussian Kernel, “Starfish” Image.

Additional nine test images suggested in [17] are also included in our experiments, in order to investigate the performance of VE further. blackIn this experiment we focus on the comparison between FP-MPE and FP for the additional images. We run the native FP method iterations first and black denote the final image by . Clearly, the corresponding cost-value is . We activate Algorithm 3 with the same initial value as used in the FP method to examine how many iterations are needed to attain the same or lower objective value than . The final number of iterations with different images are given in Table I. Clearly, an acceleration is observed in all the test images in the image deblurring task.

Image Butterfly Boats C. Man House Parrot Lena Barbara Starfish Peppers Leaves
Deblurring: Uniform kernel,
RED: FP-TNRD 200 200 200 200 200 200 200 200 200 200
RED: FP-MPE-TNRD 60 55 55 80 50 50 55 50 55 55
Deblurring: Gaussian kernel,
RED: FP-TNRD 200 200 200 200 200 200 200 200 200 200
RED: FP-MPE-TNRD 70 65 55 65 55 80 45 55 95 80
TABLE I: The number of iterations with different images for FP and FP-MPE in image deblurring task to attain the same cost.

Iv-B Image super-resolution

black We generate a low resolution image by blurring the ground truth one with a Gaussian kernel with standard derivation and then downsample by a factor of . Afterwards, an additive Gaussian noise with is added to the resulting image. The same parameters and used in the deblurring task for FP-MPE are adopted here. For SD-MPE, the parameters and are set to and , respectively. We choose “Plants” as our test image because it needs more iterations for FP-MPE to converge. blackAs observed from Fig. 4, while L-BFGS and the Nesterov’s method are faster than the FP method, our acceleration method (FP-MPE) is quite competitive with both. Furthermore, we investigate all of the test images as shown in [17] to see how many iterations are needed for MPE to achieve the same or lower cost compared with the FP method. The results are shown in Table II. As can be seen, MPE works better than the FP method indicating an effective acceleration for solving RED.

Super-Resolution, scaling = ,
Image Butterfly Flower Girl Parth. Parrot Raccoon Bike Hat Plants
RED: FP-TNRD 200 200 200 200 200 200 200 200 200
RED: FP-MPE-TNRD 60 65 50 70 55 60 60 50 70
TABLE II: The number of iterations with different images for FP and FP-MPE in image super-resolution task to attain the same cost.
(a) Cost value versus iteration.
(b) Cost value versus CPU time.
(c) PSNR versus iteration.
(d) PSNR versus CPU time.
Fig. 4: Image Super-Resolution, “Plant” Image.

Iv-C The Choice of the Parameters and the Difference Among RRE, MPE and SVD-MPE

Conclude by discussing the robustness to the choice of parameters and for the MPE algorithm. To this end, the single image super-resolution task is chosen as our study. Furthermore, we choose to demonstrate this robustness on the “Plants” image since it required the largest number of iterations in the MPE recovery process. As seen from Fig.s 5(a) - 5(c), MPE always converges faster than the regular FP method with different and . Moreover, we also observe that a lower objective value is attained through MPE. Notice that MPE has some oscillations because it is not a monotonically accelerated technique. However, we still see a lower cost is achieved if additional iterations are given.

In part (d) of Fig. 5, an optimal pair of and is chosen for MPE, RRE and SVD-MPE for the single image super-resolution task with the “Plants” image.777The optimal and are obtained by searching in the range with , seeking the fastest convergence for these three methods. We see that all three methods yield an acceleration and a lower cost, demonstrating the effectiveness of the various variants of VE. Moreover, we see that SVD-MPE converges faster at the beginning, but MPE yields a lowest eventual cost.

(a) and .
(b) and .
(c) and .
(d) Comparsion among MPE, RRE and SVD-MPE.
Fig. 5: Exploring the robustness to the choice of the parameters in MPE, and the difference among the three VE schemes. All these graphs correspond to the test image “Plants” for the single image super-resolution task.

V Conclusion

The work reported in [17] introduced RED – a flexible framework for using arbitrary image denoising algorithms as priors for general inverse problems. This scheme amounts to iterative algorithms in which the denoiser is called repeatedly. While appealing and quite practical, there is one major weakness to the RED scheme – the complexity of denoising algorithms is typically high which implies that the use of RED is likely to be costly in run-time. This work aims at deploying RED efficiently, alleviating the above described shortcoming. An accelerated technique is proposed in this paper, based on the Vector Extrapolation (VE) methodology. The proposed algorithms are demonstrated to substantially reduce the number of overall iterations required for the overall recovery process. We also observe that the choice of the parameters in the VE scheme is robust.

References

  • [1] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Computer Vision and Pattern Recognition, CVPR, IEEE Computer Society Conference on, vol. 2, pp. 60–65, 2005.
  • [2] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736–3745, 2006.
  • [3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
  • [4] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 479–486, IEEE, 2011.
  • [5] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse representation for image restoration,” IEEE Transactions on Image Processing, vol. 22, no. 4, pp. 1620–1630, 2013.
  • [6] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–1272, 2017.
  • [7] P. Chatterjee and P. Milanfar, “Is denoising dead?,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 895–911, 2010.
  • [8] P. Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 106–128, 2013.
  • [9] A. Levin and B. Nadler, “Natural image denoising: Optimality and inherent bounds,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 2833–2840, IEEE, 2011.
  • [10] M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing the nonlocal-means to super-resolution reconstruction,” IEEE Transactions on Image Processing, vol. 18, no. 1, pp. 36–51, 2009.
  • [11] A. Danielyan, V. Katkovnik, and K. Egiazarian, “Bm3d frames and variational image deblurring,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1715–1728, 2012.
  • [12] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “Optimal recovery from compressive measurements via denoising-based approximate message passing,” in Sampling Theory and Applications (SampTA), 2015 International Conference on, pp. 508–512, IEEE, 2015.
  • [13] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, pp. 945–948, IEEE, 2013.
  • [14]

    S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plug-and-play priors for bright field electron tomography and sparse interpolation,”

    IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016.
  • [15] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play admm for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017.
  • [16] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,”

    Foundations and Trends® in Machine Learning

    , vol. 3, no. 1, pp. 1–122, 2011.
  • [17] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017.
  • [18] S. Cabay and L. Jackson, “A polynomial extrapolation method for finding limits and antilimits of vector sequences,” SIAM Journal on Numerical Analysis, vol. 13, no. 5, pp. 734–752, 1976.
  • [19] N. Rajeevan, K. Rajgopal, and G. Krishna, “Vector-extrapolated fast maximum likelihood estimation algorithms for emission tomography,” IEEE Transactions on Medical Imaging, vol. 11, no. 1, pp. 9–20, 1992.
  • [20] A. Sidi, Vector Extrapolation Methods with Applications, vol. 17. SIAM, 2017.
  • [21] A. Kirsch, An introduction to the mathematical theory of inverse problems, vol. 120. Springer Science & Business Media, 2011.
  • [22] A. Sidi, “Svd-mpe: An svd-based vector extrapolation method of polynomial type,” arXiv preprint arXiv:1505.00674, 2015.
  • [23] D. A. Smith, W. F. Ford, and A. Sidi, “Extrapolation methods for vector sequences,” SIAM Review, vol. 29, no. 2, pp. 199–233, 1987.
  • [24] G. Rosman, L. Dascal, A. Sidi, and R. Kimmel, “Efficient beltrami image filtering via vector extrapolation methods,” SIAM Journal on Imaging Sciences, vol. 2, no. 3, pp. 858–878, 2009.
  • [25] A. Sidi, “Efficient implementation of minimal polynomial and reduced rank extrapolation methods,” Journal of Computational and Applied Mathematics, vol. 36, no. 3, pp. 305–337, 1991.
  • [26] G. H. Golub and C. F. Van Loan, Matrix Computations, vol. 3. JHU Press, 2012.
  • [27] A. Sidi, “Minimal polynomial and reduced rank extrapolation methods are related,” Advances in Computational Mathematics, vol. 43, no. 1, pp. 151–170, 2017.
  • [28] A. Sidi and Y. Shapira, “Upper bounds for convergence rates of acceleration methods with initial iterations,” Numerical Algorithms, vol. 18, no. 2, pp. 113–132, 1998.
  • [29] A. Sidi, W. F. Ford, and D. A. Smith, “Acceleration of convergence of vector sequences,” SIAM Journal on Numerical Analysis, vol. 23, no. 1, pp. 178–196, 1986.
  • [30] A. Sidi and J. Bridger, “Convergence and stability analyses for some vector extrapolation methods in the presence of defective iteration matrices,” Journal of Computational and Applied Mathematics, vol. 22, no. 1, pp. 35–61, 1988.
  • [31] A. Sidi, “Convergence and stability properties of minimal polynomial and reduced rank extrapolation algorithms,” SIAM Journal on Numerical Analysis, vol. 23, no. 1, pp. 197–209, 1986.
  • [32] S. Skelboe, “Computation of the periodic steady-state response of nonlinear networks by extrapolation methods,” IEEE Transactions on Circuits and Systems, vol. 27, no. 3, pp. 161–175, 1980.
  • [33] K. Jbilou and H. Sadok, “Some results about vector extrapolation methods and related fixed-point iterations,” Journal of Computational and Applied Mathematics, vol. 36, no. 3, pp. 385–398, 1991.
  • [34] Y. Nesterov, Lectures on Convex Optimization. Springer, 2018.
  • [35] J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 2006.
  • [36] K. Gu, G. Zhai, X. Yang, and W. Zhang, “Using free energy principle for blind image quality assessment,” IEEE Transactions on Multimedia, vol. 17, no. 1, pp. 50–63, 2015.
  • [37] K. Gu, G. Zhai, W. Lin, X. Yang, and W. Zhang, “No-reference image sharpness assessment in autoregressive parameter space,” IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3218–3231, 2015.