1 Introduction
Denoising images is a classical problem in image processing. Representative approaches for solving this problem include local methods such as linear filtering and anisotropic diffusion [13], global methods such as totalvariation [14] and wavelet shrinkage [3], and discrete methods such as Markov Random field denoising [16] and discrete universal denoiser [12].
As is often the case with inverse problems, many of the algorithms for image denoising involve priors on the solution. Commonly, the only prior knowledge assumed about the image is that it is of natural origin. In that case, the soughtafter prior should represent the statistics of a natural image. Natural image statistics is a research topic on its own [10, 20]
, having importance for image processing problems other than denoising: segmentation, texture synthesis, image inpainting, superresolution and more. An important observation in natural image statistics is that images contain repetitive local patterns. This observation is the basis of patchbased denoising methods such as NonLocal Means (NLM)
[1] and blockmatching and 3D filtering (BM3D) [4].In NLM, the denoising operator is constructed from patches of the corrupted image itself. As such, the denoising operator is affected by the noise. It had been noted [11, 15]
that larger eigenvalues of the operator are less sensitive to noise than smaller ones, creating an opportunity to improve the operator. In this work we propose a method for computing such a modified operator, by means of filtering the noisy one with a lowpass filter applied to its eigenvalues. In other words, we pose the problem as a filter design task, which is a classical task in signal processing. Our chosen filter function suppresses eigenvalues with small magnitude, and accurately approximates the remaining lowrank operator, while preserving the fundamental properties of the original operator. The filter is efficiently applied to the NLM operator based on Chebyshev polynomials for matrices.
We further study the concept of lowrank NLM denoising operators by numerical experiments. In these experiments we investigate our method for two main questions. The first is the dependence of the lowrank operator on its two tuning parameters, which are the width of the kernel of the original NLM operator, and the rank of the lowrank operator. The second question is the performance of our method compared to other advanced denoising algorithms. We provide a comprehensive comparison which includes a few popular stateoftheart methods and a twostage denoising scheme based on our lowrank operator.
The paper is organized as follows. In Section 2 we present the notations, the problem’s formulation and the NLM method. In Section 3 we introduce our method of constructing lowrank approximations for NLM operators, which is based on matrix filtering functions. Next, in Section 4, we show how to efficiently implement this lowrank approximation using Chebyshev polynomials. Section 5 provides numerical experiments with our algorithm, where we discuss its parameters and compare it to other advanced denoising methods. We summarize the paper in Section 6.
2 Preliminaries
We start by introducing the notations and the model for the denoising problem. Let be a discrete grid of dimension (typically, or ) and denote by a clean signal defined on . Let be a signal contaminated with noise, that is , where ,
, are independently identically distributed Gaussian random variables. The goal is to estimate
given .The nonlocal means (NLM) estimator is given as follows. Denote by the indices of the pixels within a hypercube of dimension and side length centered around the pixel . Let , be the values of the pixels of at the indices
, treated as a column vector. The NLM algorithm estimates
aswhere is the Gaussian kernel function with width , and
is a normalization factor.
In matrix notation the NLM estimator is written as
(1) 
where is the original noisy signal represented as a vector, is the denoised signal (the estimations vector), and is a NLM operator of , defined as follows.
Definition 1.
A matrix is a NLM operator of if , where
with and , and is a diagonal matrix given by .
Remark 1.
The pseudocode of all algorithms described in this paper is given in Appendix B. In these algorithms, we denote by the NLM operator of image with patch size and kernel width .
3 Denoising the NLM operator
As the NLM operator in (1) is constructed from noisy data, we would like to replace it with an operator which is less noisy. One way is to replace with its lowrank approximation [11, 15]
. A lowrank approximation obtained by a truncated singularvalue decomposition (SVD) is optimal under the
norm [8, Chapter 2.5], however, directly computing the SVD of is computationally expensive and often impractical. Computing a truncated eigenvalues (spectral) decomposition is an alternative method, typically implemented by a variant of the power iterations algorithm, such as Lanczos iterations. Unfortunately, all existing spectral decomposition methods are often too slow to be applied to a NLM matrix. Low rank approximation of the NLM matrix via truncated spectral decomposition is given in Algorithm 1 in Appendix B.Our approach is to construct a lowrank approximation of a NLM operator by using a matrix function , and computing , where is a NLM operator, as in Definition 1. We next characterize functions of NLM operators and suggest properties of such functions which are suitable for our purposes. Then, we present a particular family of functions with controlled lowrank that have these properties.
3.1 Matrix functions of NLM operators
Before studying matrix functions on NLM operators, we summarize a few properties of these NLM matrices.
Lemma 2.
Let be a NLM operator, as defined in Definition 1. Then has the following properties:

is positive diagonalizable, namely there exists an invertible matrix
that satisfies , where is a diagonal matrix , and .
Corollary 3.
In the notation of Lemma 2, any matrix function satisfies
The proof follows directly from the diagonalizability of and the definition of matrix functions, see e.g., [8, Corollary 11.1.2].
We design so that is a lowrank approximation of the NLM operator . Moreover, we wish to guarantee that retains the properties of listed in Lemma 2. The first property in Lemma 2 follows directly from Corollary 3 and allows for repeated applications of the operator , as the eigenvalues are nonnegative and bounded by 1. The advantages of repeated applications of a denoising operator have been demonstrated in [15]. The second property in Lemma 2, that is (row stochastic normalization of ), is useful because then the elements of are affine combinations (linear combinations whose coefficients sum to one) of the elements of . In particular it is desired for a denoising operator to map a constant signal to itself, see e.g., [3].
Summarizing the above discussion of desired properties of the matrix , we define the notion of an extended NLM operator.
Definition 4.
A matrix is an extended NLM operator if is diagonalizable, such that , with all its eigenvalues in .
While it is possible to define an extended NLM operator such that its eigenvalues are in , we prefer for simplicity to use Definition 4 above, as this allows for a richer family of functions to be used in Section 3.2 below. This is consistent with the Gaussian kernel used in Definition 1, which ensures that the resulting operator is nonnegative definite.
Equipped with the latter we have,
Theorem 5.
Let be a NLM operator. The following conditions on are sufficient for to be an extended NLM operator.

.

is defined on the interval and .
Proof.
By Corollary 3, acts solely on the diagonal matrix , meaning that
where the eigenvalues of are ordered such that . Therefore, the second condition guarantees that the eigenvalues of are in . In addition, the first condition ensures the maximal eigenvalue is indeed one, and is given by , that is . The eigenvector corresponding to is the first column of , which is (up to a scalar scale), namely, . ∎
The following corollary allows composition of functions satisfying the conditions of Theorem 5.
Corollary 6.
Let be an extended NLM operator, and let be a function satisfying the conditions of Theorem 5. Then, is also an extended NLM operator.
The proof is elementary and thus omitted.
3.2 Constructing a lowrank extended NLM operator
We would like to construct a function that satisfies the two conditions from Theorem 5, and moreover, suppresses the noise in the NLM operator . A natural option is to choose a function that retains the eigenvalues above a given threshold. By Corollary 3, choosing such a function reduces to choosing an appropriate scalar function.
Our first prototype for a scalar thresholding function is
(2) 
This function satisfies the conditions of Theorem 5 for , while zeroing values lower than and acting as the identity for higher values. However, this function is not smooth which eventually results in its slow evaluation for matrices (a detailed discussion is given in Section 4.2).
Inspired by the gain function of the Butterworth filter [2] (also known as maximal flat filter)
which approximates the shifted Heaveside function on the segment , we propose to use
(3) 
We term this function the Slanted Butterworth (SB) function. The SB function serves as an approximation to of (2). The response of both functions and for various values of is shown in Figure 1.
In its matrix version, the SB function becomes
(4) 
where and is the identity matrix. Combining Theorem 5 and Corollary 6, one can verify that , , is indeed an extended NLM operator and thus is applicable for our purposes. The parameter is termed the filter cutoff and is the order of the filter.
4 Computing lowrank approximation based on Chebyshev polynomials
The evaluation of the matrix function for a large matrix can be challenging. While the function essentially operates on the eigenvalues of the NLM operator , the spectral decomposition of is assumed to be unavailable due to computational reasons. Furthermore, evaluating the function directly according to its definition (4) requires computing the square root of a matrix. Evaluating the square root of a matrix is prohibitively computationally expensive, see e.g. [9], and thus, a direct evaluation of is also infeasible. In addition, an important observation is that one does not need the resulting matrix but only the vector , as can be seen from (1).
4.1 Evaluating the SB function based on Chebyshev polynomials
The Chebyshev polynomials of the first kind of degree are defined as
These polynomials satisfy the three term recursion
(5) 
with and , and form an orthogonal basis for with the inner product
(6) 
The Chebyshev expansion for any is thus given by
(7) 
where the equality above holds in the sense for any , and becomes pointwise equality under additional regularity assumptions on .
We will evaluate the function by truncating the Chebyshev expansion (7), that is
where are the corresponding Chebychev coefficients for , and maps from to , as required by the definition of above.
As shown in Corollary 3, applying a Chebychev expansion to a NLM matrix is reduced to applying it to the diagonal form of . Thus,
(8)  
where the second equality holds for any since is a continuous function [18, Chapter 8]. In other words, one can see that the coefficients required to evaluate the matrix function of (4) are the same as those required to evaluate the scalar function of (3).
For square matrices, substituting a matrix inside a polynomial is welldefined (see e.g. [8, Chapter 11]) and so given a truncation parameter , the matrix SB function (4) is approximated by
(9) 
The common practice for calculating Chebyshev expansions is by using a Gauss quadrature formula for (6) combined with the discrete orthogonality of the Chebyshev polynomials. Explicitly, the coefficients are calculated by (7) combined with
(10) 
For more details see [6, Chapter 5.8].
Having obtained the Chebyshev expansion coefficients of (9), we turn to show how to efficiently evaluate . We do the latter by using a variant of Clenshaw’s algorithm (see e.g., [6, Chapter 5.5]), which is based on the three term recursion (5), as described in [6, p. 193]. This algorithm is adapted to matrices using the fact that each polynomial consists only of powers of , which means that any two matrix polynomials commute. In addition, we exploit the fact that we need to compute only the vector and not the matrix itself, which has much larger dimensions. The pseudocode for evaluating for some vector is given in Algorithm 2 in Appendix B. Note that this algorithm does not require any matrixmatrix operations, and thus feasible even for large matrix sizes.
4.2 Error bound for NLM operators
We study the approximation error of the truncated Chebychev expansion for the case of matrix functions and for NLM operators, which are diagonalizable, nonsymmetric matrices. The use of truncated Chebyshev expansions for matrices (9
) and their approximation order has already been studied in the context of solutions for partial differential equations
[17, 18]. However, most results assume that the approximated function is analytic, and so not applicable in our case.In this section we use the following notation. Denote by the Frobenius norm of and by its spectral norm (or the induced norm), that is the largest singular value of . In addition, denote by the condition number of an invertible matrix (with respect to the spectral norm), that is .
Recall that if is a NLM operator, then can be decomposed as (see Lemma 2). In addition, is conjugated to a symmetric matrix via with being the diagonal matrix of Definition 1. Therefore, , where
is an orthogonal matrix.
The next theorem presents the main error bound.
Theorem 7.
Let and let be an NLM operator. Denote by
the approximation error of the truncated Chebychev expansion of degree . Then,
where is a constant that depends on the derivative of but independent of and , with
Proof.
Since is diagonalizable, and similarly to Corollary 3 and (8),
where is the diagonal matrix . For all submultiplicative norms, including the spectral norm, we have that
(11) 
By the Chebychev approximation bound [19, Theorem 4.3] for scalar functions,
where . The spectral norm of a diagonal matrix is given by its element on the diagonal having maximal absolute value and thus
(12) 
Now, it remains to find a bound on . However,
(13) 
Using again the submultiplicativity property we get
(14) 
where in the last equality we have used the orthogonality of . By the same property, we have that and . Combining the latter with (11), (12), and (14) concludes the proof. ∎
The proof of Theorem 7 holds with minor adjustments to various submultiplicative norms. However, for one particular norm a good bound can be achieved directly, as explained in the following remark.
Remark 2.
The error bound of the truncated Chebyshev expansion for matrices, as appears in Theorem 7, can be also expressed in terms of the Frobenius norm since . Therefore, we immediately get
where .
The bound of Theorem 7 indicates that the approximation error decays as , where is related to the smoothness class of the function , and is the number of terms in the truncated Chebyshev expansion (9). However, there are two additional factors that appear on top of this decay rate. The first one is the constant , governed by . This constant can be large for a function whose derivative has large magnitude. The second one is the condition number of . This condition number can be easily calculated numerically since is a diagonal matrix and thus
(15) 
Similarly, for the Frobenius norm we have
(16) 
Moreover, we can bound (15) and (16) a priori, as seen next.
Lemma 8.
Let be the diagonal matrix from Definition 1. Then,
Proof.
For the Frobenius norm, it follows that and thus and . Thus, by (16), . ∎
Equipped with Lemma 8, we conclude that
Corollary 9.
In the notation of Theorem 7 we have
Note that by Remark 2, a bound on the Frobenius norm of the approximation error is given by
To conclude the above discussion, we evaluate the approximation error numerically, depicted in Figure 2, where we use random, positive diagonalizable, and nonsymmetric matrices, whose rank equals to . We average the relative approximation errors defined as over all 50 matrices, where the truncated Chebyshev series is evaluated by Algorithm 2. The cutoff parameter of has been set to . The plotted curves correspond to the orders , and of the function . The results clearly show that as the derivative grows, the error rate increases, since as gets larger so are the derivatives of .
5 Numerical experiments
The advantage of improving the NLM operator by using its lowrank approximation has already been argued in Section 3. In this section we demonstrate this advantage by numerical examples, and in addition, study the effect of the two main parameters of our method – the kernel width from Definition 1 and the filter cutoff from (4). As a reference, we use the naive approach for denoising a NLM operator, by computing its truncated eigenvalues decomposition (Algorithm 1).
The numerical experiments are performed on real images corrupted by synthetically added white Gaussian noise (as described in Section 2) of varying levels. The clean test images of size pixels are shown in Figure 3.
The denoising performance in all of the following experiments is measured by the peak signaltonoise ratio (PSNR), which is given for images with values in the range by
(17) 
In this metric, two similar images, for example an image and a good approximation of it, get high PSNR value, while two different images have low PSNR value.
In addition to the PSNR, which measures the difference between two images, we use the signaltonoise ratio (SNR) to measure the noise level in a given (single) image. SNR is given by
where is the clean image,
is the standard deviation of the intensity values of
and is the standard deviation of the noise added to .The remaining of this section is organized as follows. First we examine the gain in using lowrank approximation, for different noise levels and kernel widths. Then, we compare between the naive approach of truncated spectral decomposition and our method with respect to the cutoff point, namely, the effect of different lowrank values on these two methods. We then proceed to comparing our method to a few popular denoising methods. Finally, we explore the performance of our method when combined in a stateoftheart denoising scheme.
5.1 Effectiveness of lowrank NLM operators
We explore the improvement achieved by lowrank approximations of the NLM operator compared to the original NLM operator, as a function of the noise level (SNR) of the image and the kernel width of the operator.
For this experiment we used four of the test images given in Figure 3, resized to
pixels by bicubic interpolation. We computed NLM operators for a range of kernel widths with patch size
. For each operator corresponding to a given kernel width, we constructed several lowrank approximations of it using the method of eigenvalues truncation, as given in Algorithm 1 in Appendix B. The lowrank values are given in the setIn Figure 4, we present three charts for every image, corresponding to three SNR levels (for out of the images of Figure 3). Each chart shows the PSNR between the clean image and its denoised version, as a function of the kernel width, for two operators. The two operators are the NLM operator and the best performing lowrank approximation, that is
(18) 
where and are the clean and corrupted images, respectively.
From the results of this experiment we can see that the performance gap between the optimal lowrank operator (18) and the NLM operator can be very large, in favor of the lowrank approximation, for kernel widths that are much smaller than the ideal width. This advantage diminishes when the kernel width approaches its best performing value. Note that as the level of noise increases (the SNR value decreases) this optimal width value spreads to a broader range of values. In addition, one observes that the performance gain of the lowrank operator naturally diminishes as the SNR increases since less noise is involved.
5.2 The cutoff point
We investigate the effect of the cutoff point on the performance of two denoising methods. For the naive method of truncated eigenvalues (Algorithm 1), the cutoff point corresponds to the number of leading eigenvalues preserved in the lowrank operator. For our method based upon SB function (Algorithm 3), the cutoff point corresponds to the filter’s cutoff parameter , which is a value between zero and one (Section 3.2). In the experiment of this section, we measure the “PSNR gain”, which is the difference between the PSNR of the lowrank operator and that of the original NLM operator it has been created from, as a function of the cutoff parameter.
The original NLM operators are constructed with a patch size and kernel widths , and for the different SNR values , and , respectively. These values were chosen based on the experiments in Section 5.1, as the kernel widths which result in the highest PSNR, on average. For our method based on the SB functions we use coefficients in the truncated Chebyshev expansion (9). In addition, we select as the order of the filter (4) for the higher level of noise, that is SNR of , while taking for the other two SNRs of and .
Figures 5 and 6 show the results of the experiment, per image and noise level using eigenvalues truncation and our method, respectively. One can observe that for both methods, the bestperforming cutoff, given the SNR, is far from being the same for all images. For cutoffs that are very low (resulting in extremely lowrank operators), the PSNR gain may be negative. For cutoffs that are very high, the PSNR of the low rank operator converges to the PSNR achieved by the original NLM operator, since the modified operator itself in both methods converges to the original NLM operator.
For a different perspective, we show in Figure 7 the mean improvement per noise level for the two methods. Indeed, there is a range of cutoffs that yields an improvement on the dataset for both methods.
5.3 Comparison with other algorithms
In Figure 7 we show the optimal average cutoff value for each noise level, averaged over the entire set of test images. In this section, we compare the results achieved using these cutoffs to the results of other denoising algorithms.
The denoising algorithms we compare are: NLM, truncated eigenvalues NLM (NLMEig), our NLM based on the SB function (NLMSB), NLMSB2 (two stage denoising scheme to be described in Section 5.4 below), KSVD [5], shapeadaptive DCT transform (SADCT) [7], and blockmatching and 3D filtering (BM3D) [4]. The latter three have implementations which are publicly available. The parameters used for these algorithms are the default parameters given by the respective authors in their implementations.
5.4 Two stages denoising schemes
Meyer and Shen [11] have proposed to compute a second NLmeans operator from the denoised image and apply it again. Except for introducing two stages, there are a few additional differences between their algorithm and the NLmeans algorithm, given in Definition 1.
First, it contains a nearest neighbours search for constructing the denoising operators. Instead of constructing the operator , where the elements of are given as in Definition 1, they construct as , where the matrix contains nonzero elements in row , which are the largest values of the set
Second, the way that the denoising operator is applied is different than in the NLM algorithm. Instead of computing the denoised image as , where each pixel is denoised using only the patch of which it is the center, they average the values of the pixel from all patches in which it is included.
For comparison purposes, we have modified Meyer’s scheme to use our lowrank operator, based on the SB function. The resulting algorithm is henceforth referred to as SBMeyer. In order to use in this scheme, we had to modify the way the denoising operator is computed; In contrary to the NLM operator, the operator in Meyer’s scheme is not semipositive definite (not all eigenvalues are nonnegative). This is due to using nearest neighbours to construct it. Since our framework, as given in Section 3, requires a nonnegative spectrum, we have approximated the denoising operator in Meyer’s scheme with a new semipositive definite matrix, by shifting its spectrum such that the largest negative eigenvalue becomes zero. Then, we normalized its rows such that they will sum to unity. This normalization transforms the largest positive eigenvalue back to unity (see also the proof of Lemma 2 in Appendix A).
The original algorithm of [11] uses parameters given by the authors in their source code, which were tuned only for SNR level of . Thus, we report in Table 3 its results (in the column “Meyer”) and also those of SBMeyer only for that noise level. The comparison is over the full set of images (as given in Figure 3), where the parameters of SBMeyer are given in Table 4. In addition to SBMeyer, we have implemented a simpler twostage scheme employing our operator based on the SB function, which is given in Algorithm 4 and is referred to as NLMSB2. Its parameters are given in Table 5. From Table 3 we see that using the method of the current paper improves the resulting PSNR (on average). Using a two stage scheme further improves the resulting PSNR.
SNR=1  200  200  7  3  1  1  0.6  0.4  50  16  0.2  150 

SNR  

0.5  5  1.5  1  0.3  0.3  50  50  0.5  150 
0.75  5  1.05  0.35  0.3  0.3  15  15  0.15  150 
1  5  0.5  0.3  0.3  0.3  4  4  0.15  150 
For a more visual perspective of the comparison, we provide Figures 8 and 9, where two images are denoised by the various tested algorithms. The first image is part of the Mandril image, whose clean version is in Figure (a)a. Its denoised versions are presented in Figures (c)c–(f)f. One can see that SBMeyer retained the texture much better than BM3D, and it also contains less artifacts than Meyer’s original scheme (the finding regarding texture preservation is consistent with the conclusions in [11]). The second image is part of the Barbara image, shown in Figure (a)a. The denoising results for the Barbara image are presented in Figures (c)c–(f)f, where we can see that the outputs of NLMSB and NLMSB2 are considerably less noisy than the output of NLM while retaining nearly the same amount of details. In addition, it appears that some small image regions that were incorrectly estimated by NLM were also incorrectly estimated by NLMSB. A few of these artifacts are visible on the Barbara image and they are marked by red rectangles in Figures (d)d and (e)e
. This effect of a remaining noisy region was amplified by the application of a second denoising stage in NLMSB2. We believe that these artifacts can be removed by some simple heuristic, but this is outside the scope of this work.
6 Summary
In this paper, we have investigated the idea of improving a NonLocal Means operator by manipulating its spectrum. We have shown a method to do so without computing explicitly neither the eigenvectors of the original operator nor the matrix of the modified operator.
Our method operates by applying a filtering function to the original NonLocal Means operator. To that end, we have derived sufficient conditions for such filtering functions and an efficient procedure for its application.
In this work we also show the connection between spectral shaping of the operator and the application of a matrix function on the operator. In the implementation of our approach, we demonstrate the wellknown efficiency of the Chebyshev polynomials for matrix functions. Moreover, a bound on the approximation error of the truncated Chebyshev expansion for the class of NonLocal Means matrices is proved.
We present numerical experiments from which we learn a few important observations about the improvement that can be gained by a lowrank approximation of a NonLocal Means operator. First, the improvement depends on the choice of the kernel width parameter and the noise level. Second, we find that the optimal cutoff of the spectrum of the NonLocal Means operator varies for different images and noise levels. Nevertheless, the methods of eigenvalue truncation and SB filtering, as given in Algorithms 1 and 3 respectively, both achieve a nonnegligible improvement, on average, on our dataset of images. It is also demonstrated that a further PSNR improvement can be achieved by running SB filtering via a twostage scheme, as suggested in [11].
References
 [1] Antoni Buades, Bartomeu Coll, and JM Morel. A nonlocal algorithm for image denoising. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 60–65. IEEE, 2005.
 [2] Stephen Butterworth. On the theory of filter amplifiers. Wireless Engineer, 7:536–541, 1930.
 [3] Ronald R Coifman and David L Donoho. Translationinvariant denoising. Springer New York, 1995.

[4]
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, Karen Egiazarian, et al.
BM3D image denoising with shapeadaptive principal component analysis.
In SPARS’09Signal Processing with Adaptive Sparse Structured Representations, 2009.  [5] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. Image Processing, IEEE Transactions on, 15(12):3736–3745, 2006.
 [6] Brian P Flannery, William H Press, Saul A Teukolsky, and William Vetterling. Numerical recipes in C. Press Syndicate of the University of Cambridge, New York, 1992.
 [7] Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Pointwise shapeadaptive dct for highquality denoising and deblocking of grayscale and color images. Image Processing, IEEE Transactions on, 16(5):1395–1411, 2007.
 [8] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012.
 [9] Nicholas J Higham. Stable iterations for the matrix square root. Numerical Algorithms, 15(2):227–242, 1997.
 [10] Aapo Hyvärinen, Jarmo Hurri, and Patrik O Hoyer. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision., volume 39. Springer, 2009.
 [11] François G Meyer and Xilin Shen. Perturbation of the eigenvectors of the graph laplacian: Application to image denoising. Applied and Computational Harmonic Analysis, 36(2):326–334, 2014.
 [12] Giovanni Motta, Erik Ordentlich, Ignacio Ramirez, Gadiel Seroussi, and Marcelo J Weinberger. The idude framework for grayscale image denoising. Image Processing, IEEE Transactions on, 20(1):1–21, 2011.
 [13] Pietro Perona and Jitendra Malik. Scalespace and edge detection using anisotropic diffusion. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 12(7):629–639, 1990.
 [14] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992.
 [15] Amit Singer, Yoel Shkolnisky, and Boaz Nadler. Diffusion interpretation of nonlocal neighborhood filters for signal denoising. SIAM Journal on Imaging Sciences, 2(1):118–139, 2009.
 [16] Richard Szeliski, Ramin Zabih, Daniel Scharstein, Olga Veksler, Vladimir Kolmogorov, Aseem Agarwala, Marshall Tappen, and Carsten Rother. A comparative study of energy minimization methods for markov random fields with smoothnessbased priors. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(6):1068–1080, 2008.
 [17] Hillel TalEzer. Polynomial approximation of functions of matrices and applications. Journal of Scientific Computing, 4(1):25–60, 1989.
 [18] Lloyd N Trefethen. Spectral methods in MATLAB, volume 10. SIAM, 2000.
 [19] Lloyd N Trefethen. Is Gauss quadrature better than ClenshawCurtis? SIAM review, 50(1):67–87, 2008.
 [20] Maria Zontak and Michal Irani. Internal statistics of a single natural image. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 977–984. IEEE, 2011.
Appendix A Proof of Lemma 2
Lemma 2 summarizes known properties of NLM operators. We provide its proof for the selfcontainedness of the paper.
Proof.
The NLM operator of Definition 1 is in general not symmetric, but it is conjugated via to the symmetric matrix , namely . Therefore, the NLM operator is diagonalizable and has the same eigenvalues as . On other hand, is the Gaussian kernel function, which implies that is a symmetric positive definite matrix. Since is obtained from by multiplication by on both sides, by Sylvester’s law of inertia, the eigenvalues of are positive as well. Thus, since and are conjugated, all eigenvalues of are also positive.
Since is elementwise positive, it follows from the PerronFrobenius theorem that
Moreover, since for any , , we get that . Thus, and from the positivity of the eigenvalues we conclude that , . ∎