We consider the problem of denoising grayscale images corrupted with additive white Gaussian noise. A popular denoising method is the non-local means (NLM) algorithm , where image patches are used to perform pixel aggregation. While NLM is no longer the state-of-the-art, it is still used in the image processing community due to its simplicity, decent denoising performance, and the availability of fast implementations. The NLM of an image , where , is given by 
where is a search window around the pixel of interest. The weights are set to be
where is a smoothing parameter and is a two-dimensional patch.
A direct implementation of (1) has the per-pixel complexity of , where and are typically in the range and . Several computational tricks and approximations have been proposed to speedup the direct implementation. [2, 3, 4, 5, 6, 7, 8]. A particular means to speed up NLM is using a separable approximation, which in fact is a standard trick in the image processing literature [9, 11, 12, 10]. In separable filtering, the rows are processed first followed by the columns (or in the reverse order). Of course, if the original filter is non-separable, then the output of separable filtering is generally different from that of the original filter, since a natural image typically contains diagonal details . This is the case with NLM since expression (2) is not separable. The present focus is on a recent separable approximation of NLM. At the core of this proposal is a method called lifting, which computes the NLM of a one-dimensional signal using operations per sample. In other words, the complexity of lifting is independent of the patch length . Extending lifting for NLM denoising of images, however, turns out to be a difficult task. Therefore, we proposed a separable approximation, called separable NLM (SNLM), in which the rows and columns of the image are independently filtered using lifting. In particular, we separately computed the “rows-then-columns” and “columns-then-rows” filtering, which were then optimally combined. The per-pixel complexity of SNLM is , which is a dramatic reduction compared to the complexity of NLM.
A flip side of SNLM (as is the case with other separable formulations) is that often vertical and horizontal stripes are induced in the processed image. The stripes are more prominent along the last filtered dimension. In SNLM, this problem was alleviated using the optimal recombination mentioned above followed by a bilateral filter-based post-smoothing. In this work, we demonstrate that the stripes can be mitigated in the first place simply by involving the neighboring rows (or columns) in the filtering. In other words, we use a two-dimensional search (similar to classical NLM), while still using one-dimensional patches (as done previously ). The present novelty is in the observation that one can use lifting for performing a two-dimensional search. In particular, the per-pixel complexity of the proposed approach is , which is higher than our previous proposal, but still substantially lower than that of classical NLM. Importantly, the proposed approach no longer exhibits the visible artifacts that are otherwise obtained using SNLM.
The rest of the paper is organized as follows. We recall the SNLM algorithm in Section 2 and its fast implementation using lifting. We also illustrate the artifact problem with an example. The proposed solution is presented in Section 3, along with some algorithmic details. In Section 4, we report the denoising performance of our approach and compare it with classical NLM and SNLM. We end the paper with some concluding remarks in Section 5.
2 Separable Non-Local Means
where and is a smoothing parameter. In other words, both the search window and patch are one-dimensional in this case. It was observed in our previous work that the weights can be computed using operations with respect to . In particular, consider the matrices:
We see that is the smoothed version of , obtained by box filtering along its sub-diagonals. The important observation is that we can write
In particular, using this so-called lifting, we can compute the patch distance using just three samples of , one multiplication, and two additions. The computational gain comes from the fact that the box filtering in (6) can be computed using operations with respect to using recursions. Moreover, following the observation that not all samples of are used in (3), an efficient mechanism for computing (and storing) just the required samples was proposed. The per-pixel complexity of computing (3) using lifting reduces to from the brute-force complexity of . Unfortunately, extending lifting to handle two-dimensional patches turns out to be difficult. Instead, we proposed to use separable filtering, where the rows (columns) are filtered using (3) followed by the columns (row). The two distinct outputs are then optimally combined to get the final image. In fact, the reason behind the averaging was to suppress artifacts in the form of stripes arising from the separable filtering. This is demonstrated with an example in Fig. 1, where we have compared NLM, SNLM, and the proposed approach. We used bilateral filtering to remove the stripes in SNLM, at an additional cost. However, the final image still has some residual artifacts.
3 Proposed Approach
We see less stripes in Fig. 1(d) precisely because we use a two-dimensional search. In other words, we use a cross between classical NLM and SNLM in which we use (8) for the aggregation and (4) for the weights. The two-dimensional search results in the averaging of pixels from across rows (and columns). This does not happen in SNLM, which causes the stripes to appear in Fig. 1(c).
The working of our proposal is explained in Fig 2. The pixel of interest in this case is the pixel at position marked with a red dot. The search window of length is marked with a green bounding box. Two neighboring pixels at locations and are marked with red dots. The former pixel is on a neighboring row, while the latter is on the same row as the pixel of interest. Similar to SNLM , we can consider either horizontal or vertical patches. For our example, the patches (of length ) are aligned with the image rows; they are marked with light blue rectangles. For our proposal, the denoising at is performed using the formula:
where and . To compute (8), we group the neighboring patches into two categories: (i) patches with row index , e.g., patch in Fig. 2, and (ii) patches with a different row index, e.g., patch in the figure. Let and be the -th and -th row, where is the length of a row (see Fig 2). Similar to (5) and (6), we define the matrices:
and the corresponding matrices , and , where, for example,
As in (7), the (squared) distance between patches centered at and is
On the other hand, the distance between patches centered at and is
In other words, we can compute the distance between patches centered at and using . To compute the distance between patches centered at and , we require the matrices , , and . Moreover, using these matrices, we can compute patch distances for different , and , provided the row index of and is , and the row index of is . Thus, an efficient way of computing (8) is to sequentially process the rows. For each row (fixed ), we compute , , and , where corresponds to neighboring rows that are separated by at most . We compute matrices of the form and another matrices of the form . As mentioned in Section 2, we can compute each matrix using operations with respect to . Moreover, as per the sum in (3), we only require entries within the diagonal band of each matrix. The cost of computing the banded entries is thus for each matrix. The overall cost of processing rows is . The per-pixel complexity of computing (8) using the proposed approach is thus . We can efficiently compute (and store) the banded entries using the method in Section 2.2 of the original paper. The main difference with SNLM is that we require a total of matrices for processing each row; whereas, just one matrix is required in SNLM. As shown in Fig. 1(d), some residual noise can still be seen after the processing mentioned above. We perform a similar processing once more, except this time we use one-dimensional patches along columns. The visual quality and PSNR of the final image (Fig. 1(e)) are comparable to NLM (Fig. 1(h)). Moreover, we see from Figs. 1(e) and 1(f) that if we first use one-dimensional patches along columns and then along rows, then the outputs are similar. We empirically corroborate these observations in the next section. Therefore, we propose to first process the rows using (8) and then process the columns of the intermediate image using (8). A precise description of the proposed approach for processing the (noisy) image along rows using lifting is provided in Algorithm 1. We then perform column processing on the intermediate image to obtain the final output of our algorithm. That is, we simply apply Algorithm 1 on the intermediate image, where we logically switch the rows and columns in the algorithm. Suppose and are the corresponding search windows for the row-aligned and column-aligned processing. Then we set the search parameter in Algorithm 1 as: for the row-aligned processing, and for the column-aligned processing.
|Method||House ()||Montage ()|
|Darbon et al. ||36.1/90||31.4/75||26.1/51||22.8/36||18.6/21||38.4/89||30.9/76||25.8/52||22.6/38||18.6/24|
|Method||Boat ()||Man ()|
|Darbon et al. ||34.4/97||30.3/94||25.4/82||22.4/71||18.4/53||35.1/97||30.4/98||25.6/95||22.5/90||18.5/80|
|Darbon et al. ||0.33||0.60||0.84||0.33||0.62||0.85|
The denoising performance of the proposed method is compared with NLM and SNLM in Table 1. We have used standard grayscale images from [15, 16] for our experiments. The Matlab implementation used to generate the results in this section is publicly available222http://in.mathworks.com/matlabcentral/fileexchange/64856. The search windows for the three methods were set as follows. Suppose be the search window for NLM (which we take as reference). Following the original proposal, the window for SNLM is also set as . For a fair comparison with NLM, we ensure that equal number of pixel are averaged in both methods. This is achieved if . Moreover, following, we set . These equations uniquely determine and (up to an integer rounding). Moreover, we normalize the smoothing parameters in (2) and (9) using the relation . For the results in Table 1, we set , , , , and . We notice from Table 1 that the proposed approach gives comparable results in terms of PSNR and SSIM . A visual comparison of the denoising results is provided in Fig. 3 and 4. We can clearly see some stripes in the images obtained using SNLM, both with and without post-processing (see the boxed areas). In contrast, there is hardly any artifacts present in the denoised image obtained using our method. A timing comparison is provided in Table 2. While the proposed method is slower than SNLM (this is the price we pay for removing the stripes), it is nevertheless significantly faster than NLM.
We note that though Darbon et al. 
is generally faster than our current proposal, its denoising performance starts deteriorating with the increase in noise variance. This is evident from Table1 and Fig. 4. We also note that NLM and SNLM fall short of KSVD  and BM3D  in terms of denoising performance. Nevertheless, NLM continues to be of interest due to its decent denoising capability[20, 21, 22, 23], and importantly, the availability of fast approximations. As reported by other authors, NLM is quite effective in preserving fine details, while successfully removing noise.
We proposed a method that uses the idea of lifting from previous work to perform fast non-local means denoising of images. The proposed method does not give rise to undesirable artifacts (as was the case with the original proposal), and produces images whose denoising quality and PSNR/SSIM are comparable to non-local means. While this comes at the expense of added computation, the proposed method nevertheless is much faster than non-local means. In fact, the speedup is about x for practical parameter settings.
The last author was supported by a Startup Grant from IISc and EMR Grant SB/S3/EECE/281/2016 from DST, Government of India.
-  A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” , 2, pp. 60-65 (2005).
-  M. Mahmoudi and G. Sapiro, “Fast image and video denoising via nonlocal means of similar neighborhoods,” IEEE Signal Processing Letters, 12(12), pp. 839-842 (2005).
-  J. Wang, Y. Guo, Y. Ying, Y. Liu, and Q. Peng, “Fast non-local algorithm for image denoising,” Proc. IEEE International Conference on Image Processing, pp. 1429-1432 (2006).
-  J. Darbon, A. Cunha, T. F. Chan, S. Osher, and G. J. Jensen, “Fast nonlocal filtering applied to electron cryomicroscopy,” Proc. IEEE International Symposium on Biomedical Imaging, pp. 1331-1334 (2008).
-  A. Dauwe, B. Goossens, H. Luong, and W. Philips, “A fast non-local image denoising algorithm,” Proc. SPIE Electronic Imaging, 68(12), pp. 1331-1334 (2008).
-  J. Orchard, M. Ebrahimi, and A. Wong, “Efficient nonlocal-means denoising using the SVD,” Proc. IEEE International Conference on Image Processing, pp. 1732-1735 (2008).
-  V. Karnati, M. Uliyar, and S. Dey, “Fast non-local algorithm for image denoising,” Proc. IEEE International Conference on Image Processing, pp. 3873-3876 (2009).
-  L. Condat, “A simple trick to speed up and improve the non-local means,” Research Report, HAL-00512801, (2010).
-  P. M. Narendra, “A separable median filter for image noise smoothing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 3, pp. 20-29 (1981).
-  N. Fukushima, S. Fujita, and Y. Ishibashi, “Switching dual kernels for separable edge-preserving filtering,” IEEE International Conference on Acoustics, Speech and Signal Processing, (2015).
-  T. Q. Pham and L. J. Van Vliet, “Separable bilateral filtering for fast video preprocessing,” Proc. IEEE International Conference on Multimedia and Expo, (2005).
-  Y. S. Kim, H. Lim, O. Choi, K. Lee, J. D. K. Kim, and C. Kim, “Separable bilateral non-local means,” Proc. IEEE International Conference on Image Processing, pp. 1513-1516 (2011).
-  S. Ghosh and K. N. Chaudhury, “Fast separable nonlocal means,” SPIE Journal of Electronic Imaging, 25(2), 023026 (2016).
-  E. S. Gastal and M. M. Oliveira. “Domain transform for edge-aware image and video processing,” ACM Transactions on Graphics (ToG), 30(4), 69 (2011).
-  BM3D Image Database, http://www.cs.tut.fi/~foi/GCF-BM3D.
-  KODAK Image Database, http://r0k.us/graphics/kodak/.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, 13(4), pp. 600-612 (2004).
-  M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image Processing, 15(12), pp. 3736-3745 (2006).
-  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, 16(8), pp. 2080-2095 (2007).
-  J. M. Batikian and M. Liebling, “Multicycle non-local means denoising of cardiac image sequences,” IEEE International Symposium on Biomedical Imaging, pp. 1071-1074 (2014).
-  C. Chan, R. Fulton, R. Barnett, D.D. Feng, and S. Meikle, “Post-reconstruction nonlocal means filtering of whole-body PET with an anatomical prior,” IEEE Transactions on Medical Imaging, 33(3), pp. 636-650 (2014).
-  G. Chen, P. Zhang, Y. Wu, D. Shen, and P.T. Yap, “Collaborative non-local means denoising of magnetic resonance images,” IEEE International Symposium on Biomedical Imaging, pp. 564-567 (2015).
-  D. Zeng, J. Huang, H. Zhang, Z. Bian, S. Niu, Z. Zhang, Q. Feng, W. Chen, and J. Ma, “Spectral CT image restoration via an average image-induced nonlocal means filter,” IEEE Transactions on Biomedical Engineering, 63(5), pp. 1044-1057 (2016).
-  G. Treece, “The bitonic filter: linear filtering in an edge-preserving morphological framework,” IEEE Transactions on Image Processing, 25(11), pp. 5199-5211 (2016).