1 Introduction
We consider the problem of denoising grayscale images corrupted with additive white Gaussian noise. A popular denoising method is the nonlocal means (NLM) algorithm [1], where image patches are used to perform pixel aggregation. While NLM is no longer the stateoftheart, it is still used in the image processing community due to its simplicity, decent denoising performance, and the availability of fast implementations. The NLM of an image , where , is given by [1]
(1) 
where is a search window around the pixel of interest. The weights are set to be
(2) 
where is a smoothing parameter and is a twodimensional patch.
A direct implementation of (1) has the perpixel complexity of , where and are typically in the range and .[1] Several computational tricks and approximations have been proposed to speedup the direct implementation. [2, 3, 4, 5, 6, 7, 8]. A particular means to speed up NLM is using a separable approximation, which in fact is a standard trick in the image processing literature [9, 11, 12, 10]. In separable filtering, the rows are processed first followed by the columns (or in the reverse order). Of course, if the original filter is nonseparable, then the output of separable filtering is generally different from that of the original filter, since a natural image typically contains diagonal details [12]. This is the case with NLM since expression (2) is not separable. The present focus is on a recent separable approximation of NLM.[13] At the core of this proposal is a method called lifting, which computes the NLM of a onedimensional signal using operations per sample. In other words, the complexity of lifting is independent of the patch length . Extending lifting for NLM denoising of images, however, turns out to be a difficult task. Therefore, we proposed a separable approximation, called separable NLM (SNLM)[13], in which the rows and columns of the image are independently filtered using lifting. In particular, we separately computed the “rowsthencolumns” and “columnsthenrows” filtering, which were then optimally combined. The perpixel complexity of SNLM is , which is a dramatic reduction compared to the complexity of NLM.
A flip side of SNLM (as is the case with other separable formulations[14]) is that often vertical and horizontal stripes are induced in the processed image. The stripes are more prominent along the last filtered dimension.[14] In SNLM, this problem was alleviated using the optimal recombination mentioned above followed by a bilateral filterbased postsmoothing. In this work, we demonstrate that the stripes can be mitigated in the first place simply by involving the neighboring rows (or columns) in the filtering. In other words, we use a twodimensional search (similar to classical NLM[1]), while still using onedimensional patches (as done previously [13]). The present novelty is in the observation that one can use lifting for performing a twodimensional search. In particular, the perpixel complexity of the proposed approach is , which is higher than our previous proposal, but still substantially lower than that of classical NLM. Importantly, the proposed approach no longer exhibits the visible artifacts that are otherwise obtained using SNLM.
The rest of the paper is organized as follows. We recall the SNLM algorithm in Section 2 and its fast implementation using lifting. We also illustrate the artifact problem with an example. The proposed solution is presented in Section 3, along with some algorithmic details. In Section 4, we report the denoising performance of our approach and compare it with classical NLM and SNLM. We end the paper with some concluding remarks in Section 5.
2 Separable NonLocal Means
To set up the context, we briefly recall the SNLM algorithm[13]. Suppose we have a onedimensional signal , corresponding to a row or column. The onedimensional analogue of (1) is given by
(3) 
and
(4) 
where and is a smoothing parameter. In other words, both the search window and patch are onedimensional in this case. It was observed in our previous work that the weights can be computed using operations with respect to . In particular, consider the matrices:
(5) 
and
(6) 
We see that is the smoothed version of , obtained by box filtering along its subdiagonals. The important observation[13] is that we can write
(7) 
In particular, using this socalled lifting, we can compute the patch distance using just three samples of , one multiplication, and two additions. The computational gain comes from the fact that the box filtering in (6) can be computed using operations with respect to using recursions[13]. Moreover, following the observation that not all samples of are used in (3), an efficient mechanism for computing (and storing) just the required samples was proposed[13]. The perpixel complexity of computing (3) using lifting reduces to from the bruteforce complexity of . Unfortunately, extending lifting to handle twodimensional patches turns out to be difficult. Instead, we proposed to use separable filtering, where the rows (columns) are filtered using (3) followed by the columns (row). The two distinct outputs are then optimally combined to get the final image. In fact, the reason behind the averaging was to suppress artifacts in the form of stripes arising from the separable filtering. This is demonstrated with an example in Fig. 1, where we have compared NLM, SNLM, and the proposed approach. We used bilateral filtering to remove the stripes in SNLM, at an additional cost. However, the final image still has some residual artifacts.
3 Proposed Approach
We see less stripes in Fig. 1(d) precisely because we use a twodimensional search. In other words, we use a cross between classical NLM and SNLM in which we use (8) for the aggregation and (4) for the weights. The twodimensional search results in the averaging of pixels from across rows (and columns). This does not happen in SNLM, which causes the stripes to appear in Fig. 1(c).
The working of our proposal is explained in Fig 2. The pixel of interest in this case is the pixel at position marked with a red dot. The search window of length is marked with a green bounding box. Two neighboring pixels at locations and are marked with red dots. The former pixel is on a neighboring row, while the latter is on the same row as the pixel of interest. Similar to SNLM [13], we can consider either horizontal or vertical patches. For our example, the patches (of length ) are aligned with the image rows; they are marked with light blue rectangles. For our proposal, the denoising at is performed using the formula:
(8) 
and
(9) 
where and . To compute (8), we group the neighboring patches into two categories: (i) patches with row index , e.g., patch in Fig. 2, and (ii) patches with a different row index, e.g., patch in the figure. Let and be the th and th row, where is the length of a row (see Fig 2). Similar to (5) and (6), we define the matrices:
(10) 
and the corresponding matrices , and , where, for example,
(11) 
As in (7), the (squared) distance between patches centered at and is
(12) 
On the other hand, the distance between patches centered at and is
(13) 
In other words, we can compute the distance between patches centered at and using . To compute the distance between patches centered at and , we require the matrices , , and . Moreover, using these matrices, we can compute patch distances for different , and , provided the row index of and is , and the row index of is . Thus, an efficient way of computing (8) is to sequentially process the rows. For each row (fixed ), we compute , , and , where corresponds to neighboring rows that are separated by at most . We compute matrices of the form and another matrices of the form . As mentioned in Section 2, we can compute each matrix using operations with respect to . Moreover, as per the sum in (3), we only require entries within the diagonal band of each matrix. The cost of computing the banded entries is thus for each matrix. The overall cost of processing rows is . The perpixel complexity of computing (8) using the proposed approach is thus . We can efficiently compute (and store) the banded entries using the method in Section 2.2 of the original paper[13]. The main difference with SNLM is that we require a total of matrices for processing each row; whereas, just one matrix is required in SNLM. As shown in Fig. 1(d), some residual noise can still be seen after the processing mentioned above. We perform a similar processing once more, except this time we use onedimensional patches along columns. The visual quality and PSNR of the final image (Fig. 1(e)) are comparable to NLM (Fig. 1(h)). Moreover, we see from Figs. 1(e) and 1(f) that if we first use onedimensional patches along columns and then along rows, then the outputs are similar. We empirically corroborate these observations in the next section. Therefore, we propose to first process the rows using (8) and then process the columns of the intermediate image using (8). A precise description of the proposed approach for processing the (noisy) image along rows using lifting is provided in Algorithm 1. We then perform column processing on the intermediate image to obtain the final output of our algorithm. That is, we simply apply Algorithm 1 on the intermediate image, where we logically switch the rows and columns in the algorithm. Suppose and are the corresponding search windows for the rowaligned and columnaligned processing. Then we set the search parameter in Algorithm 1 as: for the rowaligned processing, and for the columnaligned processing.
5  10  20  30  50  5  10  20  30  50  
Method  House ()  Montage ()  
Noisy  34.1/83  28.1/60  22.1/34  18.6/22  14.2/12  34.2/83  28.1/61  22.1/36  18.6/25  14.1/15 
NLM [1]  36.9/90  34.1/87  29.7/82  26.8/77  24.0/69  39.1/97  34.3/94  29.6/89  26.5/85  22.2/76 
Darbon et al. [4]  36.1/90  31.4/75  26.1/51  22.8/36  18.6/21  38.4/89  30.9/76  25.8/52  22.6/38  18.6/24 
SNLM [13]  36.6/89  33.6/86  29.4/81  26.5/76  23.7/69  39.3/97  34.6/94  29.7/89  26.7/84  22.5/77 
Proposed  36.6/89  34.1/86  30.4/82  27.3/77  24.1/70  39.4/97  34.8/94  30.2/90  27.3/86  23.4/79 
BM3D [19]  38.6/95  34.7/93  31.3/88  29.3/85  26.6/78  41.1/98  37.3/96  33.5/94  31.2/91  27.4/85 
Method  Boat ()  Man ()  
Noisy  34.1/97  28.1/90  22.1/73  18.6/59  14.2/41  34.1/99  28.1/97  22.1/91  18.6/85  14.1/70 
NLM [1]  35.1/96  30.8/89  26.7/78  24.7/70  23.0/62  35.3/98  31.1/95  27.5/88  25.8/83  24.2/76 
Darbon et al. [4]  34.4/97  30.3/94  25.4/82  22.4/71  18.4/53  35.1/97  30.4/98  25.6/95  22.5/90  18.5/80 
SNLM [13]  35.0/96  30.7/89  26.6/77  24.5/69  22.7/61  35.3/98  31.0/95  27.2/87  25.4/83  23.8/75 
Proposed  34.9/96  30.7/89  26.8/77  24.7/70  22.9/62  35.1/98  31.1/95  27.5/88  25.8/ 84  24.2/78 
BM3D [19]  37.3/98  33.9/96  30.8/92  29.0/88  26.7/81  37.3/99  34.1/98  31.2/96  29.5/94  27.4/89 
7  10  12  7  10  12  
Method  
NLM [1]  44  87  124  45  88  126 
Darbon et al. [4]  0.33  0.60  0.84  0.33  0.62  0.85 
SNLM [13]  0.31  0.39  0.45  0.32  0.40  0.46 
Proposed  1.20  2.30  3.20  1.30  2.40  3.30 
4 Experiments
The denoising performance of the proposed method is compared with NLM and SNLM in Table 1. We have used standard grayscale images from [15, 16] for our experiments. The Matlab implementation used to generate the results in this section is publicly available^{2}^{2}2http://in.mathworks.com/matlabcentral/fileexchange/64856. The search windows for the three methods were set as follows. Suppose be the search window for NLM (which we take as reference). Following the original proposal[13], the window for SNLM is also set as . For a fair comparison with NLM, we ensure that equal number of pixel are averaged in both methods. This is achieved if . Moreover, following[14], we set . These equations uniquely determine and (up to an integer rounding). Moreover, we normalize the smoothing parameters in (2) and (9) using the relation . For the results in Table 1, we set , , , , and . We notice from Table 1 that the proposed approach gives comparable results in terms of PSNR and SSIM [17]. A visual comparison of the denoising results is provided in Fig. 3 and 4. We can clearly see some stripes in the images obtained using SNLM, both with and without postprocessing (see the boxed areas). In contrast, there is hardly any artifacts present in the denoised image obtained using our method. A timing comparison is provided in Table 2. While the proposed method is slower than SNLM (this is the price we pay for removing the stripes), it is nevertheless significantly faster than NLM.
We note that though Darbon et al. [4]
is generally faster than our current proposal, its denoising performance starts deteriorating with the increase in noise variance. This is evident from Table
1 and Fig. 4. We also note that NLM and SNLM fall short of KSVD [18] and BM3D [19] in terms of denoising performance. Nevertheless, NLM continues to be of interest due to its decent denoising capability[20, 21, 22, 23], and importantly, the availability of fast approximations. As reported by other authors[24], NLM is quite effective in preserving fine details, while successfully removing noise.5 Conclusion
We proposed a method that uses the idea of lifting from previous work[13] to perform fast nonlocal means denoising of images. The proposed method does not give rise to undesirable artifacts (as was the case with the original proposal), and produces images whose denoising quality and PSNR/SSIM are comparable to nonlocal means. While this comes at the expense of added computation, the proposed method nevertheless is much faster than nonlocal means. In fact, the speedup is about x for practical parameter settings.
6 Acknowledgements
The last author was supported by a Startup Grant from IISc and EMR Grant SB/S3/EECE/281/2016 from DST, Government of India.
References

[1]
A. Buades, B. Coll, and J.M. Morel, “A nonlocal algorithm for image denoising,”
Proc. IEEE Conference on Computer Vision and Pattern Recognition
, 2, pp. 6065 (2005).  [2] M. Mahmoudi and G. Sapiro, “Fast image and video denoising via nonlocal means of similar neighborhoods,” IEEE Signal Processing Letters, 12(12), pp. 839842 (2005).
 [3] J. Wang, Y. Guo, Y. Ying, Y. Liu, and Q. Peng, “Fast nonlocal algorithm for image denoising,” Proc. IEEE International Conference on Image Processing, pp. 14291432 (2006).
 [4] J. Darbon, A. Cunha, T. F. Chan, S. Osher, and G. J. Jensen, “Fast nonlocal filtering applied to electron cryomicroscopy,” Proc. IEEE International Symposium on Biomedical Imaging, pp. 13311334 (2008).
 [5] A. Dauwe, B. Goossens, H. Luong, and W. Philips, “A fast nonlocal image denoising algorithm,” Proc. SPIE Electronic Imaging, 68(12), pp. 13311334 (2008).
 [6] J. Orchard, M. Ebrahimi, and A. Wong, “Efficient nonlocalmeans denoising using the SVD,” Proc. IEEE International Conference on Image Processing, pp. 17321735 (2008).
 [7] V. Karnati, M. Uliyar, and S. Dey, “Fast nonlocal algorithm for image denoising,” Proc. IEEE International Conference on Image Processing, pp. 38733876 (2009).
 [8] L. Condat, “A simple trick to speed up and improve the nonlocal means,” Research Report, HAL00512801, (2010).
 [9] P. M. Narendra, “A separable median filter for image noise smoothing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 3, pp. 2029 (1981).
 [10] N. Fukushima, S. Fujita, and Y. Ishibashi, “Switching dual kernels for separable edgepreserving filtering,” IEEE International Conference on Acoustics, Speech and Signal Processing, (2015).
 [11] T. Q. Pham and L. J. Van Vliet, “Separable bilateral filtering for fast video preprocessing,” Proc. IEEE International Conference on Multimedia and Expo, (2005).
 [12] Y. S. Kim, H. Lim, O. Choi, K. Lee, J. D. K. Kim, and C. Kim, “Separable bilateral nonlocal means,” Proc. IEEE International Conference on Image Processing, pp. 15131516 (2011).
 [13] S. Ghosh and K. N. Chaudhury, “Fast separable nonlocal means,” SPIE Journal of Electronic Imaging, 25(2), 023026 (2016).
 [14] E. S. Gastal and M. M. Oliveira. “Domain transform for edgeaware image and video processing,” ACM Transactions on Graphics (ToG), 30(4), 69 (2011).
 [15] BM3D Image Database, http://www.cs.tut.fi/~foi/GCFBM3D.
 [16] KODAK Image Database, http://r0k.us/graphics/kodak/.
 [17] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, 13(4), pp. 600612 (2004).
 [18] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image Processing, 15(12), pp. 37363745 (2006).
 [19] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Transactions on Image Processing, 16(8), pp. 20802095 (2007).
 [20] J. M. Batikian and M. Liebling, “Multicycle nonlocal means denoising of cardiac image sequences,” IEEE International Symposium on Biomedical Imaging, pp. 10711074 (2014).
 [21] C. Chan, R. Fulton, R. Barnett, D.D. Feng, and S. Meikle, “Postreconstruction nonlocal means filtering of wholebody PET with an anatomical prior,” IEEE Transactions on Medical Imaging, 33(3), pp. 636650 (2014).
 [22] G. Chen, P. Zhang, Y. Wu, D. Shen, and P.T. Yap, “Collaborative nonlocal means denoising of magnetic resonance images,” IEEE International Symposium on Biomedical Imaging, pp. 564567 (2015).
 [23] D. Zeng, J. Huang, H. Zhang, Z. Bian, S. Niu, Z. Zhang, Q. Feng, W. Chen, and J. Ma, “Spectral CT image restoration via an average imageinduced nonlocal means filter,” IEEE Transactions on Biomedical Engineering, 63(5), pp. 10441057 (2016).
 [24] G. Treece, “The bitonic filter: linear filtering in an edgepreserving morphological framework,” IEEE Transactions on Image Processing, 25(11), pp. 51995211 (2016).
Comments
There are no comments yet.