I Introduction
In the last few years, some very effective frameworks for image restoration have been proposed that exploit nonlocality (longdistance correlations) in images, and/or use patches instead of pixels to robustly compare photometric similarities. The archetype algorithm in this regard is the NonLocal Means (NLM) [1]. The success of NLM triggered a huge amount of research, leading to stateoftheart algorithms that exploit nonlocality and/or the patch model in specialized ways; e.g., see [3, 4, 9, 5, 6, 7, 8], to name a few. We refer the interested reader to [2, 7] for detailed reviews. Of these, the best performing method till date is perhaps the hybrid BM3D algorithm [9], which effectively combines the NLM framework with other classical algorithms.
To setup notations, we recall the working of NLM. Let be some linear indexing of the input noisy image. The standard setting is that is the corrupted version of some clean image ,
(1) 
where is iid
. The goal is to estimate (approximate)
from the noisy measurement , possibly given a good estimate of the noise floor . In NLM, the restored image is computed using the simple formula(2) 
where is some weight (affinity) assigned to pixels and . Here is the neighborhood of pixel over which the averaging is performed. To exploit nonlocal correlations, is ideally set to the whole image domain. In practice, however, one restricts to a geometric neighborhood, e.g., to a sufficiently large window of size around [1]. The other idea in NLM is to set the weights using image patches centered around each pixel. In particular, for a given pixel , let denote the restriction of to a square window around . Letting be the length of this window, this associates every pixel with a point in (the patch space). The weights in standard NLM are set to be
(3) 
where is the Euclidean distance between and as points in , and is a smoothing parameter. Along with nonlocality, it is the use of patches that makes NLM more robust in comparison to pixelbased neighborhood filters [12, 11, 10].
Recently, it was demonstrated in [13] that the denoising performance of NLM can be improved (often substantially for images with sharp edges) by replacing the regression in NLM with the more robust regression. More precisely, given weights , note that (2) is equivalent to performing the following regression (on the patch space):
(4) 
and then setting to be the center pixel in . Indeed, this reduces to (2) once we write the regression in terms of the center pixel . The idea in [13] was to use regression instead, namely, to compute
(5) 
and then set to be the center pixel in . Note that (5) is a convex optimization, and the minimizer (the Euclidean median) is unique when [14]. The resulting estimator was called the NonLocal Euclidean Medians (NLEM). A numerical scheme was proposed in [13] for computing the Euclidean median using a sequence of weighted leastsquares. It was demonstrated that NLEM performed consistently better than NLM on a large class of synthetic and natural images, as soon as the noise was above a certain threshold. More specifically, it was shown that the bulk of the improvement in NLEM came from pixels situated close to edges. An inlieroutlier model of the patch space around an edge was proposed, and the improvement was attributed to the robustness of (5) in the presence of outliers [17].
In this paper, we show how a simple extension of the above idea can dramatically improve the denoising performance of NLM, and even that of NLEM. This is the content of Section II. In particular, a general optimization and algorithmic framework is provided that includes NLM and NLEM as special cases. Some numerical results on synthetic and natural images are provided in Section III to justify our claims. Possible extensions of the present work are discussed in Section IV.
Ii NonLocal Patch Regression
Iia Robust patch regression
It is wellknown that minimization is more robust to outliers than minimization. A simple argument is that the unsquared residuals in (5) are better guarded against the aberrant data points compared to the squared residuals . The former tends to better suppress the large residuals that may result from outliers. This basic principle of robust statistics can be traced back to the works of von Neumann, Tukey [16], and Huber [17], and lies at the heart of several recent work on the design of robust estimators; e.g., see [15], and the references therein.
A natural question is what happens if we replace the regression in (5) by regression? In general, one could consider the following class of problems:
(6) 
The intuitive idea here is that, by taking smaller values of , we can better suppress the residuals induced by the outliers. This should make the regression even more robust to outliers, compared to what we get with . We note that a flip side of setting is that (6) will no longer be convex (this is essentially because is convex if and only if ), and it is in general difficult to find the global minimizer of a nonconvex functional. However, we do have a good chance of finding the global optimum if we can initialize the solver close to the global optimum. The purpose of this note is to numerically demonstrate that, for all sufficiently large , the obtained by solving (6) (and letting to be the center pixel in ) results in a more robust approximation of as , than what is obtained using NLM. Henceforth, we will refer to (6) as NonLocal Patch Regression (NLPR), where is generally allowed to take values in the range .
IiB Iterative solver
The usefulness of the above idea actually stems from the fact that there exists a simple iterative solver for (6). In fact, the idea was influenced by the wellknown connection between ‘sparsity’ and ‘robustness’, particularly the use of minimization for bestbasis selection and exact sparse recovery [18, 19, 22]. We were particularly motivated by the iteratively reweighted least squares (IRLS) approach of Daubechies et al. [21], and a regularized version of IRLS developed by Chartrand for nonconvex optimization [19, 20]. We will adapt the regularized IRLS algorithm in [19, 20] for solving (6). The exact working of this iterative solver is as follows. We use the NLM estimate to initialize the algorithm, that is, we set
(7) 
Then, at every iteration , we write in (6), and use the current estimate to approximate this by . This gives us the surrogate leastsquares problem
(8) 
Here is used as a guard against division by zero, and is gradually shrunk to zero as the iteration progresses. We refer the reader to [19] for details. The solution of (8) is explicitly given by
(9) 
where
The minimizer of (6) is taken to be the limit of the iterates, assuming that it exists. While we cannot provide any guarantee on local convergence at this point, we note that (9) can be expressed as a gradient descent step (with appropriate step size) of a smooth surrogate of (6). This interpretation leads to the wellknown Weiszfeld algorithm (for the special case ), which is known to converge linearly [26, 27]. Alternatively, one could adapt more sophisticated IRLS algorithms (e.g., the one in [21]), which come with proven guarantees on local convergence, to the case .
The overall computational complexity of NLPR is per pixel, where is the average number of iterations. For NLM, the complexity is per pixel. For a given convergence accuracy, we have noticed that increases as decreases. In particular, a large number of iterations are required in the nonconvex regime . In this case, we halt the computation after a sufficiently large number of iterations.
IiC Robustness using nearest neighbors
We noticed in [13]
that a simple heuristic often provides a remarkable improvement in the performance of NLM. In (
2), one considers all patches drawn from the geometric neighborhood of pixel . However, notice that when a patch is close to an edge, then roughly half of its neighboring patches are on one side (the correct side) of the edge. Following this observation, we consider only the top of the the neighboring patches that have the largest weights. That is, the selected patches correspond to the nearest neighbors of in the patch space, where . While this tends to inhibit the diffusion at low noise levels (in smooth regions), it was demonstrated in [13] that it can significantly improve the robustness of NLM and NLEM at large . We will also use this heuristic in NLPR. The overall scheme is summarized in Algorithm 1. We use to denote a window of size centered at pixel in the algorithm.Iii Numerical Experiments
To understand the denoising performance of NLPR, we provide some limited results on synthetic and natural images. The main theme of our investigation would be to understand how the performance of NLPR changes with the regression index . For a quantitative comparison of the denoising results, we will use the standard peaksignaltonoise ratio (PSNR). For an pixel image, with intensity scaled to , this is defined to be , where .
We first consider the test image of Checker used in [13]. This serves as a good model for simultaneously testing the denoising quality in smooth regions and in the vicinity of edges. We corrupt Checker as per the noise model in (1). We then compute the denoised image using Algorithm 1, with the exception that we skip steps (b) and (c), that is, we use the full neighborhood . We initialize the iterations of the IRLS solver using (7). For all the experiments in this paper, we fix the parameters to be and . These are the settings originally proposed in [1]. The results obtained using these settings are not necessarily optimal, and other settings could have been used as well. The point is to fix all the parameters in Algorithm 1, except . This means that the same are used for different . We now run the above denoising experiment for , and for .
The results are shown in Figure 1. We notice that, beyond a certain noise level, NLPR performs better when is close to zero. In fact, the PSNR increases gradually from to , for a fixed . At lower noise levels, the situation reverses completely, and NLPR tends to perform better around . A possible explanation is that the true neighbors in patch space are well identified at low noise levels, and since the noise is Gaussian, regression gives statistically optimal results.
An analysis of the above results shows us that, as , the bulk of the improvement comes from pixels situated in the vicinity of edges. A similar observation was also made in [13] for NLEM. To understand this better, we recall the ideal  edge model used in [13]. This is shown in Figure (a)a. We add noise of strength to the edge, and denoise it using NLPR. We examine the regression at a reference point situated just right to the edge (cf. Figure (b)b). The patch space at this point is specified using and . The distribution of patches is shown in Figure 3. Note that the patches are clustered around the centers and . For the reference point, the points around are the outliers, while the ones around are the inliers. We now perform regression on this distribution for and . The results obtained (Algorithm 1, steps (b) and (c) skipped) from a single noise realization are shown in Figure 3. The exact values of the estimate in this case are (), (), and (). The average estimate over noise realizations are (), (), and ().
We note that the working of the IRLS algorithm provides some insight into the robustness of regression. Note that when (NLM), the reconstruction in (6) is linear; the contribution of each noisy patch is controlled by the corresponding weight . On the other hand, the reconstruction is nonlinear when . The contribution of each is controlled not only by the respective weights, but also by the multipliers . In particular, the limiting value of the multipliers dictate the contribution of each in the final reconstruction. Figure (4) gives the distribution of the sorted multipliers (at convergence) for the experiment described above. In this case, the large multipliers correspond to the inliers, and the small multipliers correspond to the outliers. Notice that when , the tail part of the multipliers (outliers) has much smaller values (close to zero) compared to the leading part (inliers). In a sense, the iterative algorithm gradually ‘learns’ the outliers from the patch distribution as the iteration progresses, which are finally taken out of estimation.




Iv Discussion
Image  Method  PSNR (dB)  

NLM  34.25  29.76  26.88  25.21  24.08  23.34  22.81  22.42  22.05  21.80  
House  NLPR  33.23  30.23  27.86  26.40  25.45  24.69  24.10  23.52  22.93  22.41 
NLM  32.38  27.38  24.94  23.53  22.65  22.03  21.62  21.30  21.07  20.87  
Barbara  NLPR  31.50  28.42  26.51  25.39  24.57  23.84  23.21  22.60  22.06  21.56 
NLM  30.78  26.71  24.73  23.64  22.95  22.48  22.12  21.88  21.65  21.45  
Boat  NLPR  30.54  27.23  25.50  24.50  23.87  23.40  22.95  22.54  22.11  21.68 
NLM  31.39  27.90  24.78  22.93  21.89  21.14  20.62  20.20  19.88  19.61  
Cameraman  NLPR  31.17  27.46  25.15  25.15  22.68  22.12  21.67  21.36  20.97  20.63 
NLM  32.34  27.66  24.95  23.13  21.89  21.01  20.43  19.98  19.63  19.40  
Peppers  NLPR  31.20  27.67  25.56  24.18  23.03  22.15  21.62  21.13  20.70  20.34 
We compare the PSNRs obtained using NLPR () with that of NLM for some standard natural images in Table I. We notice that, for each of the images, NLPR consistently outperforms NLM at large noise levels. The gain in PSNR is often as large as dB. The results obtained for Barbara using NLM and NLPR are compared in Figure 5
. Note that, as expected, robust regression provides a much better restoration of the sharp edges in the image than NLM. What is probably surprising is that the restoration is superior even in the textured regions. Note, however, that NLM tends to perform better in the smooth regions. For example, we some more noise grains in the smooth regions in Figure
(d)d compared that in Figure (c)c. This suggests that an ‘adaptive’ optimization framework, which combines regression (in smooth regions) and regression (in the vicinity of edges), might possibly perform better than a fixed regression. Some other possible extensions of the present work are as follows: (i) Local convergence analysis of the present IRLS algorithm, and ways of improving it; (ii) Possibility of using more efficient numerical algorithms for solving (6); (iii) Finding better ways of estimating the denoised pixel from the estimated patch (the projection method used here is probably the simplest); (iv) Use of ‘better’ weights than the ones used in standard NLM [24, 25]; and (v) Formulation of a ‘joint’ optimization framework for (6), where the optimization is performed with respect to and [6].References
 [1] A. Buades, B. Coll, J. M. Morel, “A review of image denoising algorithms, with a new one,” Multiscale Modeling and Simulation, vol. 4, pp. 490530, 2005.
 [2] A. Buades, B. Coll, J. M. Morel, “Image denoising methods. A new nonlocal principle,” SIAM Review, vol. 52, pp. 113147, 2010.
 [3] C. Kervrann, J. Boulanger, “Optimal spatial adaptation for patchbased image denoising,” IEEE Transactions on Image Processing, vol. 15(10), 28662878, 2006.
 [4] G. Gilboa, S. Osher, “Nonlocal operators with applications to image processing,” Multiscale Modeling and Simulation, vol. 7, no. 3, 10051028, 2008.
 [5] M. Aharon, M. Elad, A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 43114322, 2006.
 [6] G. Peyré, S. Bougleux, L. D. Cohen, “Nonlocal Regularization of Inverse Problems,” Inverse Problems and Imaging, vol. 5(2), pp. 511530. 2011.
 [7] P. Chatterjee, P. Milanfar, “Patchbased nearoptimal image denoising,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 16351649, 2012.
 [8] C. Deledalle, V. Duval, J. Salmon, “Nonlocal methods with shapeadaptive patches (NLMSAP),” Journal of Mathematical Imaging and Vision, vol. 43, no. 2, pp. 103120, 2012.
 [9] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, pp. 20802095, 2007.

[10]
C. Tomasi, R. Manduchi, “Bilateral filtering for gray and color images,” IEEE International Conference on Computer Vision, pp. 839846, 1998.
 [11] S. M. Smith, J. M. Brady, “SUSAN  A new approach to low level image processing,” International Journal of Computer Vision, vol. 23, no. 1, pp. 4578, 1997.
 [12] L. P. Yaroslavsky, Digital Picture Processing. Secaucus, NJ: SpringerVerlag, 1985.
 [13] K. N. Chaudhury, A. Singer, ”Nonlocal Euclidean medians,” IEEE Signal Processing Letters, vol. 19, no. 11, pp. 745  748, 2012.
 [14] P. Milasevic, G. R. Ducharme, “Uniqueness of the Spatial Median”, Annals of Statistics, vol. 15, no. 3, 13321333, 1987.
 [15] G. Lerman, M. McCoy, J. A. Tropp, T. Zhang, “Robust computation of linear models, or how to find a needle in a haystack,” arXiv:1202.4044 [cs.IT].
 [16] J. W. Tukey, Exploratory Data Analysis. AddisonWesley, Reading, Mass., 1977.
 [17] P. J. Huber, E. M. Ronchetti, Robust Statistics. Wiley Series in Probability and Statistics, Wiley, 2009.
 [18] B. D. Rao, K. KreutzDelgado, “An affine scaling methodology for best basis selection,” IEEE Transactions on Signal Process., vol. 47, pp. 187200, 1999.
 [19] R. Chartrand, “Exact reconstruction of sparse signals via nonconvex minimization,” IEEE Signal Processing Letters, vol. 14, pp. 707710, 2007.
 [20] R. Chartrand, “Iteratively reweighted algorithms for compressive sensing,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 38693872, 2008.
 [21] I. Daubechies, R. Devore, M. Fornasier, C. S. Gunturk “Iteratively reweighted least squares minimization for sparse recovery,” Communications on Pure and Applied Mathematics, vol. 63, pp. 138, 2009.
 [22] R. Saaba, O. Yilmazb, “Sparse recovery by nonconvex optimization – instance optimality,” Applied and Computational Harmonic Analysis, vol. 29(1), pp. 3048, 2010.
 [23] J. Ji, “Robust inversion using biweight norm and its application to seismic inversion,” Exploration Geophysics, vol. 43(2), 7076, 2012.
 [24] T. Tasdizen, “Principal neighborhood dictionaries for nonlocal means image denoising,” IEEE Transaction on Image Processing, vol. 18, pp. 26492660, 2009.
 [25] D. Van De Ville, M. Kocher, “Nonlocal means with dimensionality reduction and SUREbased parameter selection,” IEEE Transactions on Image Processing, vol. 20, pp. 26832690, 2011.
 [26] E. Weiszfeld, “Sur le point par lequel le somme des distances de points donnes est minimum,” Tohoku Mathematical Journal, vol. 43, pp. 355386, 1937.

[27]
R. Hartley, K. Aftab, J. Trumpf, “L1 rotation averaging using the Weiszfeld algorithm,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 30413048, 2011.