1 Introduction
Digital images are often subject to noise degradation during acquisition in imaging systems, due to the sensor characteristics and complex camera processing pipelines. Removing the noise from the acquired images is an indispensable step for image quality enhancement in lowlevel vision tasks. In general, image denoising aims to recover a clean image from its noisy observation , where is the corrupted noise. One popular assumption on
is additive white Gaussian noise (AWGN) with standard deviation (std)
. Recently, increasing attention has been paid to removing realistic noise, which is more complex than AWGN.From the Bayesian perspective, image priors are of central importance for image denoising. Numerous methods[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51] have been developed to exploit image priors for noise removal over the past decades. These methods can be roughly divided into nonlocal selfsimilarity (NSS) based methods [2, 3, 4, 5, 1, 6, 7, 8, 10, 11, 12, 9, 13, 14, 15, 16, 17, 18, 19], sparsity or lowrankness based methods [22, 3, 4, 5, 6, 20, 7, 11, 16, 8, 9, 10, 13, 14, 15], dictionary learning based methods [22, 23, 24, 25, 6, 21], generative learning based methods [26, 27, 28, 29, 11, 31, 30, 32, 33], and discriminative learning based methods [34, 35, 36, 37, 38, 39, 40, 17, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51], etc.
Among the abovementioned methods, the NSS prior arises from the fact that, in a natural image, a local patch has many nonlocal similar patches across the image. Here, the similarity is often measured by Euclidean distance. The NSS prior has been successfully utilized by stateoftheart image denoising methods, such as BM3D [3], WNNM [10], and N3Net [18], etc. However, most existing NSSbased methods [2, 3, 4, 5, 1, 6, 7, 8, 10, 11, 9, 13, 14, 15, 16, 17, 18, 19] perform identical noise removal on similar but nuanced patches, which would results in artifacts. Despite its capability to enhance denoising performance, this patchlevel NSS prior employed in these methods suffers from one major bottleneck. That is, it is very challenging to find closely similar patches for all the reference patches in a natural image, especially when the number of similar patches is large. To break through this bottleneck, the strategy of searching shape adaptive similar patches is proposed in BM3DSAPCA [5]. However, this would introduce shape artifacts into the denoised image. Multiscale techniques [52] have been proposed to enhance similarity in a multiscale space. But the details would be degraded in the coarse scale and fail to detect similar counterparts.
In this work, we propose a pixellevel NSS prior for image denoising. Our motivation is that, since pixel is the smallest component of natural images, by lifting from patchlevel to pixellevel, the NSS prior can be utilized to a greater extent. We evaluate this point through an example on the commonly used “House” image (Fig. 1 (a)). For each reference patch of size in “House”, we search its most similar patch in the image and compute their pixelwise distance (i.e., the distance apportioned to each pixel). In Fig. 1 (b), we draw a histogram to show the relationship between the pixelwise distance and the number of reference patches with given pixelwise distance to their corresponding most similar patches. We observe that, less than reference patches (the darker bar) closely match their corresponding similar patches. Then, for each reference patch, we search its most similar patches (including the reference one). For the first pixel in each of the patches, we search its most similar pixels in the same patch. We also compute the pixelwise distance between the first pixels and their similar ones, and plot the histogram in Fig. 1 (c). We observe that, over reference patches contain closely matched pixels. We then add AWGN noise () to Fig. 1 (a), compute the pixelwise distances in patchlevel NSS (as (b)) and pixellevel NSS (as (c)), and draw the histograms in Figs. 1 (d) and (e), respectively. We observe that, the histogram in Fig. 1 (e) is shifted to left with a large margin, when compared to that in Fig. 1 (d). All these results demonstrate that, the proposed pixellevel NSS, can exploit the capability of NSS prior to a greater extent than previous patchlevel NSS.
With the proposed pixellevel NSS prior, we develop an accurate noise level estimation method, and then propose a blind image denoising method based on simple Haar transform and Wiener filtering techniques. Experiments results show that, the proposed method achieves much better performance than stateoftheart image denoising methods on commonly tested realworld datasets.
Our contributions are manifold:

We introduce a pixellevel NSS prior for image denoising, in which we find similar pixels instead of patches.

With the pixellevel NSS prior, we propose an accurate noise level estimation method. Based on this, we propose a blind pixellevel image denoising method, and extend it for realworld image denoising.

Extensive experiments on benchmark datasets demonstrate that, the proposed method achieves much better performance than the stateoftheart methods on realworld image denoising.
2 Related Work
Nonlocal Self Similarity (NSS): The NSS image prior is the essence to the success in texture synthesis [53], image denoising [3]
[54], inpainting [55], and video classification [56]. In the domain of image denoising, the NSS prior is firstly employed by the Nonlocal Means (NLM) method [2]. NLM estimates each pixel by computing a weighted average of all pixels in the image, where the weights are determined by the similarity between corresponding image patches centered at these pixels. Though this is a pixellevel method, NLM performs denoising based on the patchlevel NSS prior. The patchlevel NSS prior is later flourished in the BM3D method [3], and also in [6, 7, 10, 11, 13, 16, 17]. This prior performs denoising on groups of similar patches searched in nonlocal regions. These methods usually assume that the collected similar patches are fully matched. However, it is challenging to find closely similar patches for all the reference patches in a natural image. In this work, instead of searching similar patches, we attempt to search closely similar pixels and perform pixellevel noise removal accordingly.Realworld Image Denoising: Many realworld image denoising methods have been developed in the past decade [4, 57, 58, 59, 37, 60, 13, 32, 16, 42]. The CBM3D method [4] first transforms an input RGB image into the luminancechrominance space (e.g., YCbCr) and then applies the BM3D method [3] to each channel separately. The method of [57] introduces a “noise level function” to estimate the noise of the input image and then removes the noise accordingly. The methods of [58, 59] perform blind image denoising by estimating the noise level in image patches. The method of [37] employs a multivariate Gaussian to fit the noise in a noisy image and performs denoising accordingly. Neat Image [60] is a commercial software that removes noise according to the noise parameters estimated in a large enough flat region. MCWNNM [13] is a patchlevel NSS prior based method, demanding a large number of similar patches for lowrank approximation. GCBD [32] is a blind image denoising method that uses the Generative Adversarial Network [61]. TWSC [16] introduces a weighting scheme into the sparse coding model [62]
for realworld image denoising. It requires many similar patches for accurate weight calculation and denoising. Almost all these methods identically remove the noise in similar patches but ignore their internal variance. Besides, since the realistic noise in realworld images is pixeldependent
[37, 63, 64], patchlevel NSS operations would generate artifacts when treating all the pixels alike. As such, realworld image denoising remains a very challenging problem [63, 64, 65, 66].3 Proposed Blind Pixellevel Nonlocal Denois ing Method
In this section, we present the proposed pixellevel Nonlocal Haar transform (NLH) based method for blind image denoising. The overall method includes three parts: 1) searching nonlocal similar pixels (§3.1), 2) noise level estimation (§3.2), and 3) a twostage framework for image denoising (§3.3). The overall denoising framework is summarized in Fig. 2. In the first stage, we employ the lifting Haar transform [67, 68] and bihard thresholding for local signal intensity estimation, which is later combined with the global noise level estimation for image denoising using Wiener filtering [69] in the second stage. We then extend the proposed NLH method for realworld image denoising.
3.1 Searching Nonlocal Similar Pixels
Given a grayscale noisy image , we extract its local patches (assume there are totally patches). We stretch each local patch of size
to a vector, denoted by
(). For each , we search its most similar patches (including itself) by Euclidean distance in a large enough window (of size ) around it. We stack these vectors column by column to form a noisy patch matrix .To apply the NSS prior at the pixellevel, we further search similar pixels in by computing the Euclidean distances among the rows. Each row of contains pixels in the same relative position of different patches. The patchlevel NSS prior guarantees that the pixels in the same row are similar to some extent. However, for rare textures and details, some pixels would suffer from large variance due to shape shifts. Processing these pixels identically would generate artifacts. To resolve this problem, we carefully select the pixels that are most similar to each other. Specifically, for the th row of , we compute the distance between it and the th row () as
(1) 
Note that for each row . We then select the ( is a power of ) rows, i.e., (), in with the smallest distances to , and finally aggregate the similar pixel rows as a matrix :
(2) 
where . The noisy pixel matrices () in the whole image are used for noise level estimation, which is described as follows.
3.2 Noise Level Estimation
Accurate and fast estimation of noise levels is an essential step for efficient image denoising. The introduced pixellevel NSS prior can help achieve this goal. The rationale is that, since the pixels in the selected rows of
are very similar to each other, the standard deviation (std) of among them can be viewed as the noise level. For simplicity, we assume that the noise follows a Gaussian distribution with std
. Since the distances between the th row of and its most similar rows are (), can be computed as(3) 
Initial experiments indicate that the Eqn. (3) performs well for smooth areas, but is problematic for textures and structures. This is because, in these areas, the signal and noise are difficult to distinguish, and thus the noise level would be overestimated. To make our method more robust for noise level estimation, we extend the noise level estimation from a local region to a global one. To do so, we estimate the local noise levels for all the noisy pixel matrices in the image, and simply set the global noise level as
(4) 
Discussion. The proposed pixelbased noise level estimation method assumes the noise in the selected rows follows a Gaussian distribution, which is consistent with the assumptions in [37, 16]. The proposed method is very simple, since it only computes the distances among the most similar pixels extracted from the image. As will be shown in the experimental section (§4), the proposed noise level estimation method is very accurate, which makes it feasible to develop a blind image denoising method for realworld applications. Now we introduce the proposed twostage denoising framework below.
3.3 Twostage Denoising Framework
The proposed denoising method consists of two stages. In the first stage, we estimate the local intensities via the nonlocal Haar (NLH) transform based bihard thresholding. With the results from the first stage, we perform blind image denoising by employing Wiener filtering based soft thresholding, in the second stage. Now, we introduce the two stages in more details.
Stage 1: Local Intensity Estimation by Lifting Haar Transform based Bihard Thresholding. We have grouped a set of similar pixel matrices (. For simplicity, we ignore the index ) and estimate the global noise level . We perform denoising on similar pixel matrices in the Haar transformed domain [70]. Here, we utilize the lifting Haar wavelet transform (LHWT) [67, 68] due to its flexible operation, faster speed, and lightweight memory.
The LHWT matrices we employ here are two orthogonal matrices and . We set as powers of to accommodate the noisy pixel matrices with the Haar transform. The LHWT transform of the nonlocal similar pixel matrix is to obtain the transformed noisy coefficient matrix via
(5) 
Due to limited space, we put the detailed LHWT transforms with specific in the Supplementary File.
After LHWT transforms, we restore the th () element in th row () of the noisy coefficient matrix via hard thresholding:
(6) 
where means elementwise production, is the indicator function, and is the threshold parameter. According to the wavelet theory [67], the coefficients in the last two rows of (except the st column) are in the high frequency bands of the LHWT transform, which should largely be noise. To remove this noise in , we introduce a structurally hard thresholding strategy and completely set to all the coefficients in the high frequency bands of :
(7) 
where and are the th entry of the coefficient matrices and , respectively. We then employ inverse LHWT transforms [67, 68] on to obtain the denoised pixel matrix via
(8) 
where and are inverse LHWT matrices. Detailed inverse LHWT with specific are put in the Supplementary File. Finally, we aggregate all the denoised pixel matrices to form the denoised image. The elements in can be viewed as local signal intensities, which are used in Stage 2 for precise denoising with the globally estimated noise level . To obtain more accurate estimation of local signal intensities, we perform the above LHWT transform based bihard thresholding for iterations. For the th () iteration, we add the denoised image back to the original noisy image and obtain the noisy image as
(9) 


Noise std  5  15  25  35  50  75  100 
Zoran [71]  4.74  14.42      49.23  74.33   
Liu [72]  5.23  15.18  25.13  34.83  49.54  74.36  98.95 
Chen [73]  8.66  16.78  26.26  36.00  50.82  75.75  101.62 
Our Method (Eqn. (4))  5.91  15.88  25.64  35.50  50.45  75.40  100.97 
Stage 2: Blind Denoising by Iterative Wiener Filtering. Although the noise can be roughly removed through the bihard thresholding described in Stage 1, some noise may still remain in the smooth area, or the details may be oversmoothed. In order to more carefully remove the noise while preserving the details, we employ the Wiener filtering [69] based soft thresholding for finer denoising. We use the above estimated local signal intensities and the globally estimated noise level to perform Wiener filtering on the coefficients obtained by the LHWT transform of the original noisy pixel matrices. To further improve the denoising performance, in all experiments, we conduct the Wiener filtering based soft thresholding for two iterations. In the first iteration, we perform Wiener filtering on in Eqn. (5) as
(10) 
and then we perform the second Wiener filtering as
(11) 
Experiments on image denoising demonstrate that, the proposed method with two iterations performs the best, while using more iterations brings little improvement. We then perform inverse LHWT transforms (please see details in the Supplementary File) on to obtain the denoised pixel matrix . Finally, we aggregate all the denoised pixel matrices to form the final denoised image.
3.4 Complexity Analysis
The proposed NLH contains three parts: 1) In §3.1, the complexity of searching similar patches is , while the complexity of searching similar pixels is . Since we set , the overall complexity is . 2) In §3.2, the complexity for noise level estimation is , which can be ignored. 3) In §3.3, the complexity of the two stages are and , respectively. Since we have , the complexity of the proposed NLH is .
3.5 Extension to Realworld Image Denoising
To accommodate the proposed NLH method with realworld RGB images, we first transform the RGB images into the luminancechrominance (e.g., YCbCr) space [3], and then perform similar pixel searching in the Y channel. The similar pixels in the other two channels (i.e., Cb and Cr) are correspondingly grouped. We perform denoising for each channel separately and aggregate the denoised channels back to form the denoised YCbCr image. Finally, we transform it back to the RGB space for visualization.
4 Experiments and Results
In this section, we first evaluate the developed noise level estimation method on synthetic noisy images. The goal of this experiment is to the pixellevel nonlocal self similarity (NSS) prior. We then evaluate the proposed NLH method on both synthetic images corrupted by additive white Gaussian noise (AWGN) and realworld noisy images. Finally, we perform comprehensive ablation studies to gain a deeper insight into the proposed NLH method.


Noise std  15  25  35  50  75  
Metric  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM 
NLM [2]  31.20  0.8483  28.64  0.7602  26.82  0.6762  24.80  0.5646  22.43  0.4224 
BM3D [5]  32.42  0.8860  30.02  0.8364  28.48  0.7969  26.85  0.7481  24.74  0.6649 
LSSC [6]  32.27  0.8849  29.84  0.8329  28.26  0.7908  26.64  0.7405  24.77  0.6746 
NCSR [7]  32.19  0.8814  29.76  0.8293  28.17  0.7855  26.55  0.7391  24.66  0.6793 
WNNM [10]  32.43  0.8841  30.05  0.8365  28.51  0.7958  26.92  0.7499  25.15  0.6903 
TNRD [74]  32.48  0.8845  30.07  0.8366  28.53  0.7957  26.95  0.7495  25.10  0.6901 
DnCNN [40]  32.59  0.8879  30.22  0.8415  28.66  0.8021  27.08  0.7563  25.24  0.6931 
NLH (Blind)  32.28  0.8796  30.09  0.8355  28.60  0.7988  27.11  0.7524  25.31  0.6932 
4.1 Implementation Details
The proposed NLH method has 7 main parameters: patch size , window size for searching similar patches, number of similar patches , number of similar pixels , regularization parameter , hard threshold parameter , and iteration number only (, , only exist in Stage 1). In all experiments, we set , , , , . For synthetic AWGN corrupted image denoising, we set for , for in both stages. For realworld image denoising, we set , in both stages.
4.2 Results on Noise Level Estimation
The proposed pixellevel NSS prior can be used to estimate the noise level of the input noisy image. We compare our method (Eqn. (4)) with leading noise level estimation methods, such as Zoran [71], Liu [72], and Chen [73]. The comparison is performed on the 68 images from the commonly tested BSD68 dataset. We generate synthetic noisy images by adding AWGN with to the clean images. The comparison results on are listed in Table 1, from which one can see that, the proposed method can accurately estimate different noise levels for various noisy images. Note that the proposed method only utilizes the introduced pixellevel NSS prior, and the results indeed validate its effectiveness on noise level estimation.
4.3 Results on Synthetic AWGN Corrupted Images
On 20 grayscale images widely used in [3, 10, 11], we compare the proposed NLH method with several competing AWGN denoising methods, such as BM3D [3], LSSC [6], NCSR [7], WNNM [10], TNRD [74], and DnCNN [40]. For BM3D, we employ its extension called BM3DSAPCA [5], which usually performs better than BM3D on grayscale images. We employ the NonLocal Means (NLM) [2] as a baseline to validate the effectiveness of the pixellevel NSS prior. The source codes of these methods are downloaded from the corresponding authors’ websites, and we use the default parameter settings. The methods of TNRD and DnCNN are discriminative learning based methods, and we use the models trained originally by their authors. The noisy image is generated by adding AWGN noise with standard deviation (std) to the corresponding clean image, and in this paper we set .
From Table 2 we can see that, the proposed NLH is comparable with the leading denoising methods on average PSNR (dB) and SSIM [75]. Note that TNRD and DnCNN are trained on clean and synthetic noisy image pairs, while NLH can blindly remove the noise with the introduced pixellevel NSS prior. By comparing the performance of NLM and NLH, one can see that the proposed pixellevel denoising method performs much better than simply averaging the central pixels of similar patches. The visual quality comparisons can be found in the Supplementary File.
4.4 Results on RealWorld Noisy Images


Camera Settings  #  CBM3D  NI  NC  CC  MCWNNM  TWSC  DnCNN+  FFDNet+  CBDNet  NLH 
Canon 5D M3  1  39.76  35.68  36.20  38.37  41.13  40.76  38.02  39.35  36.68  41.57 
ISO = 3200  2  36.40  34.03  34.35  35.37  37.28  36.02  35.87  36.99  35.58  37.39 
3  36.37  32.63  33.10  34.91  36.52  34.99  35.51  36.50  35.27  36.68  
Nikon D600  4  34.18  31.78  32.28  34.98  35.53  35.32  34.75  34.96  34.01  35.50 
ISO = 3200  5  35.07  35.16  35.34  35.95  37.02  37.10  35.28  36.70  35.19  37.21 
6  37.13  39.98  40.51  41.15  39.56  40.90  37.43  40.94  39.80  41.34  
Nikon D800  7  36.81  34.84  35.09  37.99  39.26  39.23  37.63  38.62  38.03  39.67 
ISO = 1600  8  37.76  38.42  38.65  40.36  41.43  41.90  38.79  41.45  40.40  42.66 
9  37.51  35.79  35.85  38.30  39.55  39.06  37.07  38.76  36.86  40.04  
Nikon D800  10  35.05  38.36  38.56  39.01  38.91  40.03  35.45  40.09  38.75  40.21 
ISO = 3200  11  34.07  35.53  35.76  36.75  37.41  36.89  35.43  37.57  36.52  37.30 
12  34.42  40.05  40.59  39.06  39.39  41.49  34.98  41.10  38.42  42.02  
Nikon D800  13  31.13  34.08  34.25  34.61  34.80  35.47  31.12  34.11  34.13  36.19 
ISO = 6400  14  31.22  32.13  32.38  33.21  33.95  34.05  31.93  33.64  33.45  34.70 
15  30.97  31.52  31.76  33.22  33.94  33.88  31.79  33.68  33.45  34.83  
Average    35.19  35.33  35.65  36.88  37.71  37.81  35.40  37.63  36.44  38.49 
Comparison Methods. We compare the proposed NLH method with CBM3D [4], a commercial software Neat Image (NI) [60], “Noise Clinic” (NC) [58], CrossChannel (CC) [37], MCWNNM [13], TWSC [16]. CBM3D can directly deal with color images, and the std of input noise is estimated by [73]. For MCWNNM and TWSC, we use [73] to estimate the noise std () for each channel and perform denoising accordingly. We also compare the proposed NLH method with DnCNN+ [40], FFDNet+ [41] and CBDNet [42]
, which are stateoftheart convolutional neural network (CNN) based image denoising methods. FFDNet+ is a multiscale extension of FFDNet
[41] with a manually selected uniform noise level map. DnCNN+ is based on the color version of DnCNN [40] for blind denoising, but finetuned with the results of FFDNet+ [41]. Note that for FFDNet+ and DnCNN+, there is no need to estimate the noise std. For the three CNN based methods, we asked the authors to run the experiments for us. We also run the codes using our machine for speed comparisons.


Metric  CBM3D  NI  NC  MCWNNM  TWSC  DnCNN+  FFDNet+  CBDNet  NLH 
PSNR  34.51  35.11  35.43  37.38  37.96  37.90  37.61  38.06  38.81 
SSIM  0.8507  0.8778  0.8841  0.9294  0.9416  0.9430  0.9415  0.9421  0.9520 
CPU (GPU) Time  8.4  1.2  18.5  251.2  233.6  106.2 (0.05)  49.9 (0.03)  5.4 (0.40)  5.3 
Datasets and Results. We evaluate the proposed NLH on two commonly used realworld image denoising datasets, i.e., the CrossChannel (CC) dataset [37] and the Darmstadt Noise Dataset (DND) [63].
The CC dataset [37] includes noisy images of 11 static scenes captured by Canon 5D Mark 3, Nikon D600, and Nikon D800 cameras. The realworld noisy images were collected under a controlled indoor environment. Each scene is shot 500 times using the same camera and settings. The average of the 500 shots is taken as the “ground truth”. The authors cropped 15 images of size to evaluate different denoising methods. The comparisons in terms of PSNR are listed in Table 3. It can be seen that, the proposed NLH method achieves the highest PSNR results on most images. Fig. 3 shows the denoised images yielded by different methods on a scene captured by a Nikon D800 with ISO=1600. As can be seen, NLH also achieves better visual quality than other methods. More comparisons on SSIM and visual quality can be found in the Supplementary File.
The DND dataset [63] includes 50 different scenes captured by Sony A7R, Olympus EM10, Sony RX100 IV, and Huawei Nexus 6P. Each scene contains a pair of noisy and “ground truth” clean images. The noisy images are collected under higher ISO values with shorter exposure times, while the “ground truth” images are captured under lower ISO values with adjusted longer exposure times. For each scene, the authors cropped 20 bounding boxes of size , generating a total of 1000 test crops. The “ground truth” images are not released, but we can evaluate the performance by submitting the denoised images to the DND’s Website. In Table 4, we list the average PSNR (dB) and SSIM [75] results of different methods. Fig. 4 shows the visual comparisons on the image “0001_18” captured by a Nexus 6P camera. It can be seen that, the proposed NLH method achieves much higher PSNR and SSIM results, with more visually pleasing images than the other methods. More visual quality comparisons can be found in the Supplementary File.
Speed. We also compare the speed of all competing methods. All experiments are run under the Matlab 2016a environment on a machine with a quadcore 3.4GHz CPU and 8GB RAM. We also run DnCNN+, FFDNet+, and CBDNet on a Titan XP GPU. In Table 4, we also show the average run time (in seconds) of different methods, on the 1000 RGB images of size in [63]. The fastest result is highlighted in bold. It can be seen that, Neat Image only needs an average of 1.2 seconds to process a RGB image. The proposed NLH method needs seconds (using parallel computing), which is much faster than the other methods, including the patchlevel NSS based methods such as MCWNNM and TWSC, the CNN based methods DnCNN+, FFDNet+, and CBDNet. The majority of time in the proposed NLH method is spent on searching similar patches, which takes an average of 2.8 seconds. Further searching similar pixels only takes an average of 0.3 seconds. This demonstrates that, the introduced pixellevel NSS prior adds only a small amount of calculation, when compared to its patchlevel counterpart.
4.5 Validation of the Proposed NLH Method
We now conduct a more detailed examination of our proposed method. We assess 1) the accuracy of pixellevel NSS vs. patchlevel NSS; 2) the contribution of the proposed pixellevel NSS prior for NLH on realworld image denoising; 3) the necessity of the twostage framework; and 4) the individual influence of the 7 major parameters on NLH.
1. Is pixellevel NSS more accurate than patchlevel NSS? To answer this question, we compute the average pixelwise distances (APDs, the distance apportioned to each pixel) of nonlocal similar pixels and patches on the CC dataset [37]. From Table 5, we can see that, on 15 mean images and 15 noisy images (normalized into ), the APDs of pixellevel NSS are smaller than those of patchlevel NSS. In other words, pixellevel NSS is more accurate than the patchlevel NSS on measuring similarity.


Aspect  Mean Image  Noisy Image 
Patchlevel NSS  0.0043  
Pixellevel NSS  0.0026 
2. Does pixellevel NSS prior contribute to image denoising? Here, we study the contribution of the proposed pixellevel NSS prior. To this end, we remove the searching of pixellevel NSS in NLH. Thus we have a baseline: w/o Pixel NSS. From Table 6, we observe a clear drop in PSNR (dB) and SSIM results over two datasets, which implies the effectiveness of the proposed pixellevel NSS prior.


CC [37]  DND [63]  
Variant  PSNR  SSIM  PSNR  SSIM 
NLH  38.49  0.9647  38.81  0.9520 
w/o Pixel NSS  38.14  0.9602  38.27  0.9414 
w/o Stage 2  37.64  0.9572  37.27  0.9355 
3. Is Stage 2 necessary? We also study the effect of the Stage 2 in NLH. To do so, we remove the Stage 2 from NLH, and have a baseline: w/o Stage 2. From Table 6, we can see a huge performance drop on two datasets. This shows that, the Stage 2 complements the Stage 1 with soft Wiener filtering, and is essential to the proposed NLH.
4. How each parameter influences NLH’s denoising performance? The proposed NLH mainly has 7 parameters (please see §4.1 for details). We change one parameter at a time to assess its individual influence on NLH. Table 7 lists the average PSNR results of NLH with different parameter values on CC dataset [37]. It can be seen that: 1) The variations of PSNR results are from 0.02dB (for iteration number ) to 0.16dB (for number of similar patches ), when changing individual parameters; 2) The performance on PSNR increases with increasing patch size , window size , or iteration number . For performancespeed tradeoff, we set , , and in NLH for efficient image denoising; 3) The number of similar pixels is novel in NLH. To our surprise, even with similar pixels, NLH still performs very well, only drop 0.01dB on PSNR compared to case with . However, with , the performance of NLH decreases gradually. The reason is that, searching more (e.g., ) pixels in patches may decrease the accuracy of pixellevel NSS, hence degrade the performance of NLH. Similar trends can be observed by changing the number of similar patches, i.e., the value of . In summary, all the parametric analyses demonstrate that, NLH is very robust on realworld image denoising, as long as the 7 parameters are set in reasonable ranges.


Value  5  6  7  8  Margin  
PSNR  38.41  38.47  38.49  38.51  0.10  
Value  20  30  40  50  Margin  
PSNR  38.39  38.43  38.49  38.51  0.12  
Value  2  4  8  16  Margin  
PSNR  38.48  38.49  38.47  38.43  0.06  
Value  8  16  32  64  Margin  
PSNR  38.33  38.49  38.48  38.43  0.16  
Value  1.5  2  2.5  3  Margin  
PSNR  38.39  38.49  38.51  38.50  0.12  
Value  2  3  4  5  Margin  
PSNR  38.49  38.51  38.51  38.51  0.02  
Value  0.2  0.4  0.6  0.8  Margin  
PSNR  38.46  38.47  38.49  38.49  0.03 
5 Conclusion
How to utilize the nonlocal self similarity (NSS) prior for image denoising is an open problem. In this paper, we attempted to utilize the NSS prior to a greater extent by lifting the patchlevel NSS prior to the pixellevel NSS prior. With the pixellevel NSS prior, we developed an accurate noise level estimation method, based on which we proposed a blind image denoising method. We estimated the local signal intensities via nonlocal Haar (NLH) transform based bihard thresholding, and performed denoising accordingly by Wiener filtering based soft thresholding. Experiments on benchmark datasets demonstrated that, the proposed NLH method significantly outperforms previous stateoftheart methods on realworld image denoising task.
References
 [1] C. Liu, R. Szeliski, S. Kang, C. L. Zitnick, and W. T. Freeman. Automatic estimation and removal of noise from a single image. IEEE TPAMI, 30(2):299–314, 2008.
 [2] A. Buades, B. Coll, and J. M. Morel. A nonlocal algorithm for image denoising. In CVPR, pages 60–65, 2005.
 [3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3D transformdomain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095, 2007.
 [4] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminancechrominance space. In ICIP, pages 313–316. IEEE, 2007.

[5]
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian.
Bm3d image denoising with shapeadaptive principal component analysis.
In SPARS, 2009.  [6] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Nonlocal sparse models for image restoration. In ICCV, pages 2272–2279, 2009.
 [7] W. Dong, L. Zhang, G. Shi, and X. Li. Nonlocally centralized sparse representation for image restoration. IEEE Transactions on Image Processing, 22(4):1620–1630, 2013.
 [8] W. Dong, G. Shi, and X. Li. Nonlocal image restoration with bilateral variance estimation: A lowrank approach. IEEE Transactions on Image Processing, 22(2):700–711, 2013.
 [9] H. Ji, C. Liu, Z. Shen, and Y. Xu. Robust video denoising using low rank matrix completion. In CVPR, pages 1791–1798. IEEE, 2010.
 [10] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In CVPR, pages 2862–2869. IEEE, 2014.
 [11] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based nonlocal selfsimilarity prior learning for image denoising. In ICCV, pages 244–252, 2015.

[12]
J. Xu, D. Ren, L. Zhang, and D. Zhang.
Patch group based bayesian learning for blind image denoising.
Asian Conference on Computer Vision Workshop
, pages 79–95, 2016.  [13] J. Xu, L. Zhang, D. Zhang, and X. Feng. Multichannel weighted nuclear norm minimization for real color image denoising. In ICCV, 2017.
 [14] N. Yair and T. Michaeli. Multiscale weighted nuclear norm image restoration. In CVPR, pages 3165–3174, 2018.

[15]
B. Wen, Y. Li, L. Pfister, and Y. Bresler.
Joint adaptive sparsity and lowrankness on the fly: An online tensor reconstruction scheme for video denoising.
In ICCV, pages 241–250, 2017.  [16] J. Xu, L. Zhang, and D. Zhang. A trilateral weighted sparse coding scheme for realworld image denoising. In ECCV, 2018.
 [17] S. Lefkimmiatis. Nonlocal color image denoising with convolutional neural networks. In CVPR, pages 3587–3596, 2017.
 [18] T. Plötz and S. Roth. Neural nearest neighbors networks. In NIPS, 2018.
 [19] D. Liu, B. Wen, Y. Fan, C. C. Loy, and T. S. Huang. Nonlocal recurrent network for image restoration. In NIPS, pages 1680–1689, 2018.
 [20] A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian. Practical poissoniangaussian noise modeling and fitting for singleimage rawdata. IEEE Transactions on Image Processing, 17(10):1737–1754, 2008.
 [21] P. Chatterjee and P. Milanfar. Clusteringbased denoising with locally learned dictionaries. IEEE Transactions on Image Processing, 18(7):1438–1451, 2009.
 [22] M. Elad and M. Aharon. Image denoising via learned dictionaries and sparse representation. In CVPR, volume 1, pages 895–900, 2006.
 [23] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
 [24] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
 [25] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEE Transactions on Image Processing,, 17(1):53–69, 2008.
 [26] I. Mosseri, M. Zontak, and M. Irani. Combining the power of internal and external denoising. In ICCP, pages 1–9, 2013.
 [27] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
 [28] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In ICCV, pages 479–486, 2011.
 [29] H. Talebi and P. Milanfar. Global image denoising. IEEE Transactions on Image Processing, 23(2):755–768, 2014.
 [30] J. Xu, L. Zhang, and D. Zhang. External prior guided internal prior learning for realworld noisy image denoising. IEEE Transactions on Image Processing, 27(6):2996–3010, 2018.
 [31] F. Zhu, G. Chen, and P. A. Heng. From noise modeling to blind image denoising. In CVPR, pages 420–429, 2016.
 [32] J. Chen, J. Chen, H. Chao, and M. Yang. Image blind denoising with generative adversarial network based noise modeling. In CVPR, pages 3155–3164, 2018.
 [33] A. Pajot, E. Bezenac, and P. Gallinari. Unsupervised adversarial image reconstruction. In ICLR, 2019.
 [34] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In CVPR, pages 2392–2399, 2012.
 [35] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In CVPR, pages 2774–2781, June 2014.
 [36] Y. Chen, W. Yu, and T. Pock. On learning optimized reaction diffusion processes for effective image restoration. In CVPR, pages 5261–5269, 2015.
 [37] S. Nam, Y. Hwang, Y. Matsushita, and S. J. Kim. A holistic approach to crosschannel image noise modeling and its application to image denoising. In CVPR, pages 1683–1691, 2016.
 [38] X. Mao, C. Shen, and Y. Yang. Image restoration using very deep convolutional encoderdecoder networks with symmetric skip connections. In NIPS, pages 2802–2810, 2016.
 [39] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand. Deep joint demosaicking and denoising. ACM Transactions on Graphics, 35(6):191, 2016.
 [40] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 2017.
 [41] K. Zhang, W. Zuo, and L. Zhang. Ffdnet: Toward a fast and flexible solution for cnn based image denoising. IEEE Transactions on Image Processing, 2018.
 [42] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. arXiv:1807.04686, 2018.
 [43] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV, pages 4539–4547, 2017.
 [44] S. Lefkimmiatis. Nonlocal color image denoising with convolutional neural networks. In CVPR, pages 3587–3596, 2017.
 [45] S. Lefkimmiatis. Universal denoising networks: A novel cnn architecture for image denoising. In CVPR, 2018.
 [46] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. In ICML, 2018.
 [47] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In CVPR, pages 9446–9454, 2018.
 [48] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In CVPR, 2018.
 [49] B. Mildenhall, J. T. Barron, J. Chen, D. Sharlet, R. Ng, and R. Carroll. Burst denoising with kernel prediction networks. In CVPR, pages 2502–2510, 2018.
 [50] X. Zhang, Y. Lu, J. Liu, and B. Dong. Dynamically unfolding recurrent restorer: A moving endpoint control method for image restoration. In ICLR, 2019.
 [51] J. Xu, Y. Huang, L. Liu, F. Zhu, X. Hou, and L. Shao. Noisyasclean: Learning unsupervised denoising from the corrupted image, 2019.
 [52] M. Zontak, I. Mosseri, and M. Irani. Separating signal from noise using patch recurrence across scales. In CVPR, 2013.
 [53] A. A. Efros and T. K. Leung. Texture synthesis by nonparametric sampling. In ICCV, 1999.
 [54] D. Glasner, S. Bagon, and M. Irani. Superresolution from a single image. In ICCV, 2009.
 [55] C. Barnes, E. Shechtman, A. Finkelstein, and D.B. Goldman. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3):24, 2009.
 [56] X. Wang, R. Girshick, A. Gupta, and K. He. Nonlocal neural networks. In CVPR, 2018.
 [57] C. Liu, R. Szeliski, S. Bing Kang, C. L. Zitnick, and W. T. Freeman. Automatic estimation and removal of noise from a single image. IEEE TPAMI, 30(2):299–314, 2008.
 [58] M. Lebrun, M. Colom, and J.M. Morel. Multiscale image blind denoising. IEEE Transactions on Image Processing, 24(10):3149–3161, 2015.
 [59] F. Zhu, G. Chen, and P.A. Heng. From noise modeling to blind image denoising. In CVPR, June 2016.
 [60] Neatlab ABSoft. Neat Image. https://ni.neatvideo.com/home.
 [61] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
 [62] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
 [63] T. Plötz and S. Roth. Benchmarking denoising algorithms with real photographs. In CVPR, 2017.
 [64] A. Abdelhamed, S. Lin, and M. S. Brown. A highquality denoising dataset for smartphone cameras. In CVPR, June 2018.
 [65] J. Xu, H. Li, Z. Liang, D. Zhang, and L. Zhang. Realworld noisy image denoising: A new benchmark. arXiv:1804.02603, 2018.
 [66] J. Anaya and A. Barbu. RENOIR: A dataset for real lowlight image noise reduction. JVCIR, 51:144 – 154, 2018.
 [67] W. Sweldens. The lifting scheme: A customdesign construction of biorthogonal wavelets. Applied and Computational Harmonic Analysis, 3(2):186 – 200, 1996.
 [68] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. Journal of Fourier Analysis and Applications, 4(3):247–269, 1998.

[69]
N. Wiener.
Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications.
1949.  [70] A. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 69(3):331–371, Sep 1910.
 [71] D. Zoran and Y. Weiss. Scale invariance and noise in natural images. In ICCV, pages 2209–2216, 2009.
 [72] X. Liu, M. Tanaka, and M. Okutomi. Singleimage noise level estimation for blind denoising. IEEE Transactions on Image Processing, 22(12):5226–5237, 2013.
 [73] G. Chen, F. Zhu, and A. H. Pheng. An efficient statistical method for image noise level estimation. In ICCV, 2015.
 [74] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE TPAMI, 39(6):1256–1272, 2017.
 [75] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
Comments
There are no comments yet.