NLH: A Blind Pixel-level Non-local Method for Real-world Image Denoising

06/17/2019 ∙ by Yingkun Hou, et al. ∙ 1

Non-local self similarity (NSS) is a powerful prior of natural images for image denoising. Most of existing denoising methods employ similar patches, which is a patch-level NSS prior. In this paper, we take one step forward by introducing a pixel-level NSS prior, i.e., searching similar pixels across a non-local region. This is motivated by the fact that finding closely similar pixels is more feasible than similar patches in natural images, which can be used to enhance image denoising performance. With the introduced pixel-level NSS prior, we propose an accurate noise level estimation method, and then develop a blind image denoising method based on the lifting Haar transform and Wiener filtering techniques. Experiments on benchmark datasets demonstrate that, the proposed method achieves much better performance than state-of-the-art methods on real-world image denoising. The code will be released.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Digital images are often subject to noise degradation during acquisition in imaging systems, due to the sensor characteristics and complex camera processing pipelines. Removing the noise from the acquired images is an indispensable step for image quality enhancement in low-level vision tasks. In general, image denoising aims to recover a clean image from its noisy observation , where is the corrupted noise. One popular assumption on

is additive white Gaussian noise (AWGN) with standard deviation (std)

. Recently, increasing attention has been paid to removing realistic noise, which is more complex than AWGN.

From the Bayesian perspective, image priors are of central importance for image denoising. Numerous methods[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51] have been developed to exploit image priors for noise removal over the past decades. These methods can be roughly divided into non-local self-similarity (NSS) based methods [2, 3, 4, 5, 1, 6, 7, 8, 10, 11, 12, 9, 13, 14, 15, 16, 17, 18, 19], sparsity or low-rankness based methods [22, 3, 4, 5, 6, 20, 7, 11, 16, 8, 9, 10, 13, 14, 15], dictionary learning based methods [22, 23, 24, 25, 6, 21], generative learning based methods [26, 27, 28, 29, 11, 31, 30, 32, 33], and discriminative learning based methods [34, 35, 36, 37, 38, 39, 40, 17, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51], etc.

Among the above-mentioned methods, the NSS prior arises from the fact that, in a natural image, a local patch has many non-local similar patches across the image. Here, the similarity is often measured by Euclidean distance. The NSS prior has been successfully utilized by state-of-the-art image denoising methods, such as BM3D [3], WNNM [10], and N3Net [18], etc. However, most existing NSS-based methods [2, 3, 4, 5, 1, 6, 7, 8, 10, 11, 9, 13, 14, 15, 16, 17, 18, 19] perform identical noise removal on similar but nuanced patches, which would results in artifacts. Despite its capability to enhance denoising performance, this patch-level NSS prior employed in these methods suffers from one major bottleneck. That is, it is very challenging to find closely similar patches for all the reference patches in a natural image, especially when the number of similar patches is large. To break through this bottleneck, the strategy of searching shape adaptive similar patches is proposed in BM3D-SAPCA [5]. However, this would introduce shape artifacts into the denoised image. Multi-scale techniques [52] have been proposed to enhance similarity in a multi-scale space. But the details would be degraded in the coarse scale and fail to detect similar counterparts.

(a) House

(b) Patch Matching on Clean Image

(c) Pixel Matching on Clean Image

(d) Patch Matching on Noisy Image

(e) Pixel Matching on Noisy Image

Figure 1: Histograms of the pixel-wise distance and the number of reference patches (or pixels) whose pixel-wise distances to their corresponding most similar patches (or pixels) are . Noisy image is generated by adding AWGN noise with to (a). The images are normalized to . The pixel-wise distance maps are also plotted on the top-right corners of the corresponding histograms.

In this work, we propose a pixel-level NSS prior for image denoising. Our motivation is that, since pixel is the smallest component of natural images, by lifting from patch-level to pixel-level, the NSS prior can be utilized to a greater extent. We evaluate this point through an example on the commonly used “House” image (Fig. 1 (a)). For each reference patch of size in “House”, we search its most similar patch in the image and compute their pixel-wise distance (i.e., the distance apportioned to each pixel). In Fig. 1 (b), we draw a histogram to show the relationship between the pixel-wise distance and the number of reference patches with given pixel-wise distance to their corresponding most similar patches. We observe that, less than reference patches (the darker bar) closely match their corresponding similar patches. Then, for each reference patch, we search its most similar patches (including the reference one). For the first pixel in each of the patches, we search its most similar pixels in the same patch. We also compute the pixel-wise distance between the first pixels and their similar ones, and plot the histogram in Fig. 1 (c). We observe that, over reference patches contain closely matched pixels. We then add AWGN noise () to Fig. 1 (a), compute the pixel-wise distances in patch-level NSS (as (b)) and pixel-level NSS (as (c)), and draw the histograms in Figs. 1 (d) and (e), respectively. We observe that, the histogram in Fig. 1 (e) is shifted to left with a large margin, when compared to that in Fig. 1 (d). All these results demonstrate that, the proposed pixel-level NSS, can exploit the capability of NSS prior to a greater extent than previous patch-level NSS.

Figure 2: The overall framework of the proposed NLH method for real-world image denoising.

With the proposed pixel-level NSS prior, we develop an accurate noise level estimation method, and then propose a blind image denoising method based on simple Haar transform and Wiener filtering techniques. Experiments results show that, the proposed method achieves much better performance than state-of-the-art image denoising methods on commonly tested real-world datasets.

Our contributions are manifold:

  • We introduce a pixel-level NSS prior for image denoising, in which we find similar pixels instead of patches.

  • With the pixel-level NSS prior, we propose an accurate noise level estimation method. Based on this, we propose a blind pixel-level image denoising method, and extend it for real-world image denoising.

  • Extensive experiments on benchmark datasets demonstrate that, the proposed method achieves much better performance than the state-of-the-art methods on real-world image denoising.

2 Related Work

Non-local Self Similarity (NSS): The NSS image prior is the essence to the success in texture synthesis [53], image denoising [3]

, super-resolution

[54], inpainting [55], and video classification [56]. In the domain of image denoising, the NSS prior is firstly employed by the Non-local Means (NLM) method [2]. NLM estimates each pixel by computing a weighted average of all pixels in the image, where the weights are determined by the similarity between corresponding image patches centered at these pixels. Though this is a pixel-level method, NLM performs denoising based on the patch-level NSS prior. The patch-level NSS prior is later flourished in the BM3D method [3], and also in [6, 7, 10, 11, 13, 16, 17]. This prior performs denoising on groups of similar patches searched in non-local regions. These methods usually assume that the collected similar patches are fully matched. However, it is challenging to find closely similar patches for all the reference patches in a natural image. In this work, instead of searching similar patches, we attempt to search closely similar pixels and perform pixel-level noise removal accordingly.

Real-world Image Denoising: Many real-world image denoising methods have been developed in the past decade [4, 57, 58, 59, 37, 60, 13, 32, 16, 42]. The CBM3D method [4] first transforms an input RGB image into the luminance-chrominance space (e.g., YCbCr) and then applies the BM3D method [3] to each channel separately. The method of [57] introduces a “noise level function” to estimate the noise of the input image and then removes the noise accordingly. The methods of [58, 59] perform blind image denoising by estimating the noise level in image patches. The method of [37] employs a multivariate Gaussian to fit the noise in a noisy image and performs denoising accordingly. Neat Image [60] is a commercial software that removes noise according to the noise parameters estimated in a large enough flat region. MCWNNM [13] is a patch-level NSS prior based method, demanding a large number of similar patches for low-rank approximation. GCBD [32] is a blind image denoising method that uses the Generative Adversarial Network [61]. TWSC [16] introduces a weighting scheme into the sparse coding model [62]

for real-world image denoising. It requires many similar patches for accurate weight calculation and denoising. Almost all these methods identically remove the noise in similar patches but ignore their internal variance. Besides, since the realistic noise in real-world images is pixel-dependent

[37, 63, 64], patch-level NSS operations would generate artifacts when treating all the pixels alike. As such, real-world image denoising remains a very challenging problem [63, 64, 65, 66].

3 Proposed Blind Pixel-level Non-local Denois- ing Method

In this section, we present the proposed pixel-level Non-local Haar transform (NLH) based method for blind image denoising. The overall method includes three parts: 1) searching non-local similar pixels (§3.1), 2) noise level estimation (§3.2), and 3) a two-stage framework for image denoising (§3.3). The overall denoising framework is summarized in Fig. 2. In the first stage, we employ the lifting Haar transform [67, 68] and bi-hard thresholding for local signal intensity estimation, which is later combined with the global noise level estimation for image denoising using Wiener filtering [69] in the second stage. We then extend the proposed NLH method for real-world image denoising.

3.1 Searching Non-local Similar Pixels

Given a grayscale noisy image , we extract its local patches (assume there are totally patches). We stretch each local patch of size

to a vector, denoted by

(). For each , we search its most similar patches (including itself) by Euclidean distance in a large enough window (of size ) around it. We stack these vectors column by column to form a noisy patch matrix .

To apply the NSS prior at the pixel-level, we further search similar pixels in by computing the Euclidean distances among the rows. Each row of contains pixels in the same relative position of different patches. The patch-level NSS prior guarantees that the pixels in the same row are similar to some extent. However, for rare textures and details, some pixels would suffer from large variance due to shape shifts. Processing these pixels identically would generate artifacts. To resolve this problem, we carefully select the pixels that are most similar to each other. Specifically, for the -th row of , we compute the distance between it and the -th row () as

(1)

Note that for each row . We then select the ( is a power of ) rows, i.e., (), in with the smallest distances to , and finally aggregate the similar pixel rows as a matrix :

(2)

where . The noisy pixel matrices () in the whole image are used for noise level estimation, which is described as follows.

3.2 Noise Level Estimation

Accurate and fast estimation of noise levels is an essential step for efficient image denoising. The introduced pixel-level NSS prior can help achieve this goal. The rationale is that, since the pixels in the selected rows of

are very similar to each other, the standard deviation (std) of among them can be viewed as the noise level. For simplicity, we assume that the noise follows a Gaussian distribution with std

. Since the distances between the -th row of and its most similar rows are (), can be computed as

(3)

Initial experiments indicate that the Eqn. (3) performs well for smooth areas, but is problematic for textures and structures. This is because, in these areas, the signal and noise are difficult to distinguish, and thus the noise level would be over-estimated. To make our method more robust for noise level estimation, we extend the noise level estimation from a local region to a global one. To do so, we estimate the local noise levels for all the noisy pixel matrices in the image, and simply set the global noise level as

(4)

Discussion. The proposed pixel-based noise level estimation method assumes the noise in the selected rows follows a Gaussian distribution, which is consistent with the assumptions in [37, 16]. The proposed method is very simple, since it only computes the distances among the most similar pixels extracted from the image. As will be shown in the experimental section (§4), the proposed noise level estimation method is very accurate, which makes it feasible to develop a blind image denoising method for real-world applications. Now we introduce the proposed two-stage denoising framework below.

3.3 Two-stage Denoising Framework

The proposed denoising method consists of two stages. In the first stage, we estimate the local intensities via the non-local Haar (NLH) transform based bi-hard thresholding. With the results from the first stage, we perform blind image denoising by employing Wiener filtering based soft thresholding, in the second stage. Now, we introduce the two stages in more details.

Stage 1: Local Intensity Estimation by Lifting Haar Transform based Bi-hard Thresholding. We have grouped a set of similar pixel matrices (. For simplicity, we ignore the index ) and estimate the global noise level . We perform denoising on similar pixel matrices in the Haar transformed domain [70]. Here, we utilize the lifting Haar wavelet transform (LHWT) [67, 68] due to its flexible operation, faster speed, and lightweight memory.

The LHWT matrices we employ here are two orthogonal matrices and . We set as powers of to accommodate the noisy pixel matrices with the Haar transform. The LHWT transform of the non-local similar pixel matrix is to obtain the transformed noisy coefficient matrix via

(5)

Due to limited space, we put the detailed LHWT transforms with specific in the Supplementary File.

After LHWT transforms, we restore the -th () element in -th row () of the noisy coefficient matrix via hard thresholding:

(6)

where means element-wise production, is the indicator function, and is the threshold parameter. According to the wavelet theory [67], the coefficients in the last two rows of (except the -st column) are in the high frequency bands of the LHWT transform, which should largely be noise. To remove this noise in , we introduce a structurally hard thresholding strategy and completely set to all the coefficients in the high frequency bands of :

(7)

where and are the -th entry of the coefficient matrices and , respectively. We then employ inverse LHWT transforms [67, 68] on to obtain the denoised pixel matrix via

(8)

where and are inverse LHWT matrices. Detailed inverse LHWT with specific are put in the Supplementary File. Finally, we aggregate all the denoised pixel matrices to form the denoised image. The elements in can be viewed as local signal intensities, which are used in Stage 2 for precise denoising with the globally estimated noise level . To obtain more accurate estimation of local signal intensities, we perform the above LHWT transform based bi-hard thresholding for iterations. For the -th () iteration, we add the denoised image back to the original noisy image and obtain the noisy image as

(9)

 

Noise std 5 15 25 35 50 75 100
Zoran [71] 4.74 14.42 - - 49.23 74.33 -
Liu [72] 5.23 15.18 25.13 34.83 49.54 74.36 98.95
Chen [73] 8.66 16.78 26.26 36.00 50.82 75.75 101.62
Our Method (Eqn. (4)) 5.91 15.88 25.64 35.50 50.45 75.40 100.97
Table 1: Estimated noise levels of different methods on the BSD68 dataset corrupted by AWGN noise with std . “-” indicates that the results cannot be obtained due to the internal errors of the code. Our method (Eqn. (4)) achieves better results when .

Stage 2: Blind Denoising by Iterative Wiener Filtering. Although the noise can be roughly removed through the bi-hard thresholding described in Stage 1, some noise may still remain in the smooth area, or the details may be over-smoothed. In order to more carefully remove the noise while preserving the details, we employ the Wiener filtering [69] based soft thresholding for finer denoising. We use the above estimated local signal intensities and the globally estimated noise level to perform Wiener filtering on the coefficients obtained by the LHWT transform of the original noisy pixel matrices. To further improve the denoising performance, in all experiments, we conduct the Wiener filtering based soft thresholding for two iterations. In the first iteration, we perform Wiener filtering on in Eqn. (5) as

(10)

and then we perform the second Wiener filtering as

(11)

Experiments on image denoising demonstrate that, the proposed method with two iterations performs the best, while using more iterations brings little improvement. We then perform inverse LHWT transforms (please see details in the Supplementary File) on to obtain the denoised pixel matrix . Finally, we aggregate all the denoised pixel matrices to form the final denoised image.

3.4 Complexity Analysis

The proposed NLH contains three parts: 1) In §3.1, the complexity of searching similar patches is , while the complexity of searching similar pixels is . Since we set , the overall complexity is . 2) In §3.2, the complexity for noise level estimation is , which can be ignored. 3) In §3.3, the complexity of the two stages are and , respectively. Since we have , the complexity of the proposed NLH is .

3.5 Extension to Real-world Image Denoising

To accommodate the proposed NLH method with real-world RGB images, we first transform the RGB images into the luminance-chrominance (e.g., YCbCr) space [3], and then perform similar pixel searching in the Y channel. The similar pixels in the other two channels (i.e., Cb and Cr) are correspondingly grouped. We perform denoising for each channel separately and aggregate the denoised channels back to form the denoised YCbCr image. Finally, we transform it back to the RGB space for visualization.

4 Experiments and Results

In this section, we first evaluate the developed noise level estimation method on synthetic noisy images. The goal of this experiment is to the pixel-level non-local self similarity (NSS) prior. We then evaluate the proposed NLH method on both synthetic images corrupted by additive white Gaussian noise (AWGN) and real-world noisy images. Finally, we perform comprehensive ablation studies to gain a deeper insight into the proposed NLH method.

 

Noise std 15 25 35 50 75
Metric PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
NLM   [2] 31.20 0.8483 28.64 0.7602 26.82 0.6762 24.80 0.5646 22.43 0.4224
BM3D   [5] 32.42 0.8860 30.02 0.8364 28.48 0.7969 26.85 0.7481 24.74 0.6649
LSSC   [6] 32.27 0.8849 29.84 0.8329 28.26 0.7908 26.64 0.7405 24.77 0.6746
NCSR   [7] 32.19 0.8814 29.76 0.8293 28.17 0.7855 26.55 0.7391 24.66 0.6793
WNNM [10] 32.43 0.8841 30.05 0.8365 28.51 0.7958 26.92 0.7499 25.15 0.6903
TNRD [74] 32.48 0.8845 30.07 0.8366 28.53 0.7957 26.95 0.7495 25.10 0.6901
DnCNN [40] 32.59 0.8879 30.22 0.8415 28.66 0.8021 27.08 0.7563 25.24 0.6931
NLH (Blind) 32.28 0.8796 30.09 0.8355 28.60 0.7988 27.11 0.7524 25.31 0.6932
Table 2: Average PSNR(dB)/SSIM results of different denoising methods on 20 grayscale images corrupted by AWGN noise.

4.1 Implementation Details

The proposed NLH method has 7 main parameters: patch size , window size for searching similar patches, number of similar patches , number of similar pixels , regularization parameter , hard threshold parameter , and iteration number only (, , only exist in Stage 1). In all experiments, we set , , , , . For synthetic AWGN corrupted image denoising, we set for , for in both stages. For real-world image denoising, we set , in both stages.

4.2 Results on Noise Level Estimation

The proposed pixel-level NSS prior can be used to estimate the noise level of the input noisy image. We compare our method (Eqn. (4)) with leading noise level estimation methods, such as Zoran [71], Liu [72], and Chen [73]. The comparison is performed on the 68 images from the commonly tested BSD68 dataset. We generate synthetic noisy images by adding AWGN with to the clean images. The comparison results on are listed in Table 1, from which one can see that, the proposed method can accurately estimate different noise levels for various noisy images. Note that the proposed method only utilizes the introduced pixel-level NSS prior, and the results indeed validate its effectiveness on noise level estimation.

4.3 Results on Synthetic AWGN Corrupted Images

On 20 grayscale images widely used in [3, 10, 11], we compare the proposed NLH method with several competing AWGN denoising methods, such as BM3D [3], LSSC [6], NCSR [7], WNNM [10], TNRD [74], and DnCNN [40]. For BM3D, we employ its extension called BM3D-SAPCA [5], which usually performs better than BM3D on grayscale images. We employ the Non-Local Means (NLM) [2] as a baseline to validate the effectiveness of the pixel-level NSS prior. The source codes of these methods are downloaded from the corresponding authors’ websites, and we use the default parameter settings. The methods of TNRD and DnCNN are discriminative learning based methods, and we use the models trained originally by their authors. The noisy image is generated by adding AWGN noise with standard deviation (std) to the corresponding clean image, and in this paper we set .

From Table 2 we can see that, the proposed NLH is comparable with the leading denoising methods on average PSNR (dB) and SSIM [75]. Note that TNRD and DnCNN are trained on clean and synthetic noisy image pairs, while NLH can blindly remove the noise with the introduced pixel-level NSS prior. By comparing the performance of NLM and NLH, one can see that the proposed pixel-level denoising method performs much better than simply averaging the central pixels of similar patches. The visual quality comparisons can be found in the Supplementary File.

4.4 Results on Real-World Noisy Images

 

Camera Settings # CBM3D NI NC CC MCWNNM TWSC DnCNN+ FFDNet+ CBDNet NLH
Canon 5D M3 1 39.76 35.68 36.20 38.37 41.13 40.76 38.02 39.35 36.68 41.57
ISO = 3200 2 36.40 34.03 34.35 35.37 37.28 36.02 35.87 36.99 35.58 37.39
3 36.37 32.63 33.10 34.91 36.52 34.99 35.51 36.50 35.27 36.68
Nikon D600 4 34.18 31.78 32.28 34.98 35.53 35.32 34.75 34.96 34.01 35.50
ISO = 3200 5 35.07 35.16 35.34 35.95 37.02 37.10 35.28 36.70 35.19 37.21
6 37.13 39.98 40.51 41.15 39.56 40.90 37.43 40.94 39.80 41.34
Nikon D800 7 36.81 34.84 35.09 37.99 39.26 39.23 37.63 38.62 38.03 39.67
ISO = 1600 8 37.76 38.42 38.65 40.36 41.43 41.90 38.79 41.45 40.40 42.66
9 37.51 35.79 35.85 38.30 39.55 39.06 37.07 38.76 36.86 40.04
Nikon D800 10 35.05 38.36 38.56 39.01 38.91 40.03 35.45 40.09 38.75 40.21
ISO = 3200 11 34.07 35.53 35.76 36.75 37.41 36.89 35.43 37.57 36.52 37.30
12 34.42 40.05 40.59 39.06 39.39 41.49 34.98 41.10 38.42 42.02
Nikon D800 13 31.13 34.08 34.25 34.61 34.80 35.47 31.12 34.11 34.13 36.19
ISO = 6400 14 31.22 32.13 32.38 33.21 33.95 34.05 31.93 33.64 33.45 34.70
15 30.97 31.52 31.76 33.22 33.94 33.88 31.79 33.68 33.45 34.83
Average - 35.19 35.33 35.65 36.88 37.71 37.81 35.40 37.63 36.44 38.49
Table 3: PSNR(dB) results of different methods on the 15 cropped real-world noisy images in CC dataset [37].
(a) Noisy: 35.71/0.8839 (f) DnCNN+: 38.79/0.9547 (b) NC: 38.65/0.9591 (g) FFDNet+: 41.45/0.9800 (c) CC: 40.36/0.9767 (h) CBDNet: 40.40/0.9781 (d) MCWNNM: 41.43/0.9683 (i) NLH: 42.66/0.9833 (e) TWSC: 41.90/0.9804 (j) Mean Image
Figure 3: Comparison of denoised images and PSNR(dB)/SSIM by different methods on “Nikon D800 ISO=1600 2[37].

Comparison Methods. We compare the proposed NLH method with CBM3D [4], a commercial software Neat Image (NI) [60], “Noise Clinic” (NC) [58], Cross-Channel (CC) [37], MCWNNM [13], TWSC [16]. CBM3D can directly deal with color images, and the std of input noise is estimated by [73]. For MCWNNM and TWSC, we use [73] to estimate the noise std () for each channel and perform denoising accordingly. We also compare the proposed NLH method with DnCNN+ [40], FFDNet+ [41] and CBDNet [42]

, which are state-of-the-art convolutional neural network (CNN) based image denoising methods. FFDNet+ is a multi-scale extension of FFDNet

[41] with a manually selected uniform noise level map. DnCNN+ is based on the color version of DnCNN [40] for blind denoising, but fine-tuned with the results of FFDNet+ [41]. Note that for FFDNet+ and DnCNN+, there is no need to estimate the noise std. For the three CNN based methods, we asked the authors to run the experiments for us. We also run the codes using our machine for speed comparisons.

(a) Noisy: 18.77/0.3015 (f) TWSC: 32.97/0.9163 (b) CBM3D: 23.95/0.5078 (g) DnCNN+: 32.26/0.8906 (c) NI: 27.28/0.6330 (h) FFDNet+: 32.14/0.9162 (d) NC: 28.32/0.7186 (i) CBDNet: 31.40/0.8364 (e) MCWNNM: 31.74/0.8748 (j) NLH: 32.85/0.9202
Figure 4: Comparison of denoised images and PSNR(dB)/SSIM by different methods on “0001_18”, captured by a Nexus 6P [63]. The “ground-truth” image is not released, but PSNR(dB)/SSIM results are publicly provided on DND’s Website.

 

Metric CBM3D NI NC MCWNNM TWSC DnCNN+ FFDNet+ CBDNet NLH
PSNR 34.51 35.11 35.43 37.38 37.96 37.90 37.61 38.06 38.81
SSIM 0.8507 0.8778 0.8841 0.9294 0.9416 0.9430 0.9415 0.9421 0.9520
CPU (GPU) Time 8.4 1.2 18.5 251.2 233.6 106.2 (0.05) 49.9 (0.03) 5.4 (0.40) 5.3
Table 4: Average results of PSNR(dB), SSIM, and CPU Time (in seconds) of different methods on 1000 cropped real-world noisy images in DND dataset [63]. The GPU Time of DnCNN+, FFDNet+, and CBDNet are also reported in parentheses.

Datasets and Results. We evaluate the proposed NLH on two commonly used real-world image denoising datasets, i.e., the Cross-Channel (CC) dataset [37] and the Darmstadt Noise Dataset (DND) [63].

The CC dataset [37] includes noisy images of 11 static scenes captured by Canon 5D Mark 3, Nikon D600, and Nikon D800 cameras. The real-world noisy images were collected under a controlled indoor environment. Each scene is shot 500 times using the same camera and settings. The average of the 500 shots is taken as the “ground truth”. The authors cropped 15 images of size to evaluate different denoising methods. The comparisons in terms of PSNR are listed in Table 3. It can be seen that, the proposed NLH method achieves the highest PSNR results on most images. Fig. 3 shows the denoised images yielded by different methods on a scene captured by a Nikon D800 with ISO=1600. As can be seen, NLH also achieves better visual quality than other methods. More comparisons on SSIM and visual quality can be found in the Supplementary File.

The DND dataset [63] includes 50 different scenes captured by Sony A7R, Olympus E-M10, Sony RX100 IV, and Huawei Nexus 6P. Each scene contains a pair of noisy and “ground truth” clean images. The noisy images are collected under higher ISO values with shorter exposure times, while the “ground truth” images are captured under lower ISO values with adjusted longer exposure times. For each scene, the authors cropped 20 bounding boxes of size , generating a total of 1000 test crops. The “ground truth” images are not released, but we can evaluate the performance by submitting the denoised images to the DND’s Website. In Table 4, we list the average PSNR (dB) and SSIM [75] results of different methods. Fig. 4 shows the visual comparisons on the image “0001_18” captured by a Nexus 6P camera. It can be seen that, the proposed NLH method achieves much higher PSNR and SSIM results, with more visually pleasing images than the other methods. More visual quality comparisons can be found in the Supplementary File.

Speed. We also compare the speed of all competing methods. All experiments are run under the Matlab 2016a environment on a machine with a quad-core 3.4GHz CPU and 8GB RAM. We also run DnCNN+, FFDNet+, and CBDNet on a Titan XP GPU. In Table 4, we also show the average run time (in seconds) of different methods, on the 1000 RGB images of size in [63]. The fastest result is highlighted in bold. It can be seen that, Neat Image only needs an average of 1.2 seconds to process a RGB image. The proposed NLH method needs seconds (using parallel computing), which is much faster than the other methods, including the patch-level NSS based methods such as MCWNNM and TWSC, the CNN based methods DnCNN+, FFDNet+, and CBDNet. The majority of time in the proposed NLH method is spent on searching similar patches, which takes an average of 2.8 seconds. Further searching similar pixels only takes an average of 0.3 seconds. This demonstrates that, the introduced pixel-level NSS prior adds only a small amount of calculation, when compared to its patch-level counterpart.

4.5 Validation of the Proposed NLH Method

We now conduct a more detailed examination of our proposed method. We assess 1) the accuracy of pixel-level NSS vs. patch-level NSS; 2) the contribution of the proposed pixel-level NSS prior for NLH on real-world image denoising; 3) the necessity of the two-stage framework; and 4) the individual influence of the 7 major parameters on NLH.

1. Is pixel-level NSS more accurate than patch-level NSS? To answer this question, we compute the average pixel-wise distances (APDs, the distance apportioned to each pixel) of non-local similar pixels and patches on the CC dataset [37]. From Table 5, we can see that, on 15 mean images and 15 noisy images (normalized into ), the APDs of pixel-level NSS are smaller than those of patch-level NSS. In other words, pixel-level NSS is more accurate than the patch-level NSS on measuring similarity.

 

Aspect Mean Image Noisy Image
Patch-level NSS 0.0043
Pixel-level NSS 0.0026
Table 5: Average pixel-wise distances of pixel-level NSS and patch-level NSS, on the 15 cropped mean images and corresponding noisy images in CC dataset [37].

2. Does pixel-level NSS prior contribute to image denoising? Here, we study the contribution of the proposed pixel-level NSS prior. To this end, we remove the searching of pixel-level NSS in NLH. Thus we have a baseline: w/o Pixel NSS. From Table 6, we observe a clear drop in PSNR (dB) and SSIM results over two datasets, which implies the effectiveness of the proposed pixel-level NSS prior.

 

CC [37] DND [63]
Variant PSNR SSIM PSNR SSIM
NLH 38.49 0.9647 38.81 0.9520
w/o Pixel NSS 38.14 0.9602 38.27 0.9414
w/o Stage 2 37.64 0.9572 37.27 0.9355
Table 6: Ablation study on the CC [37] and DND [63] datasets. We change one component at a time to assess its individual contributions to the proposed NLH method.

3. Is Stage 2 necessary? We also study the effect of the Stage 2 in NLH. To do so, we remove the Stage 2 from NLH, and have a baseline: w/o Stage 2. From Table 6, we can see a huge performance drop on two datasets. This shows that, the Stage 2 complements the Stage 1 with soft Wiener filtering, and is essential to the proposed NLH.

4. How each parameter influences NLH’s denoising performance? The proposed NLH mainly has 7 parameters (please see §4.1 for details). We change one parameter at a time to assess its individual influence on NLH. Table 7 lists the average PSNR results of NLH with different parameter values on CC dataset [37]. It can be seen that: 1) The variations of PSNR results are from 0.02dB (for iteration number ) to 0.16dB (for number of similar patches ), when changing individual parameters; 2) The performance on PSNR increases with increasing patch size , window size , or iteration number . For performance-speed tradeoff, we set , , and in NLH for efficient image denoising; 3) The number of similar pixels is novel in NLH. To our surprise, even with similar pixels, NLH still performs very well, only drop 0.01dB on PSNR compared to case with . However, with , the performance of NLH decreases gradually. The reason is that, searching more (e.g., ) pixels in patches may decrease the accuracy of pixel-level NSS, hence degrade the performance of NLH. Similar trends can be observed by changing the number of similar patches, i.e., the value of . In summary, all the parametric analyses demonstrate that, NLH is very robust on real-world image denoising, as long as the 7 parameters are set in reasonable ranges.

 

Value 5 6 7 8 Margin
PSNR 38.41 38.47 38.49 38.51 0.10
Value 20 30 40 50 Margin
PSNR 38.39 38.43 38.49 38.51 0.12
Value 2 4 8 16 Margin
PSNR 38.48 38.49 38.47 38.43 0.06
Value 8 16 32 64 Margin
PSNR 38.33 38.49 38.48 38.43 0.16
Value 1.5 2 2.5 3 Margin
PSNR 38.39 38.49 38.51 38.50 0.12
Value 2 3 4 5 Margin
PSNR 38.49 38.51 38.51 38.51 0.02
Value 0.2 0.4 0.6 0.8 Margin
PSNR 38.46 38.47 38.49 38.49 0.03
Table 7: PSNR (dB) of NLH with different parameters over the 15 noisy images in CC dataset [37]. We change one parameter at a time to assess its individual influence on NLH.

5 Conclusion

How to utilize the non-local self similarity (NSS) prior for image denoising is an open problem. In this paper, we attempted to utilize the NSS prior to a greater extent by lifting the patch-level NSS prior to the pixel-level NSS prior. With the pixel-level NSS prior, we developed an accurate noise level estimation method, based on which we proposed a blind image denoising method. We estimated the local signal intensities via non-local Haar (NLH) transform based bi-hard thresholding, and performed denoising accordingly by Wiener filtering based soft thresholding. Experiments on benchmark datasets demonstrated that, the proposed NLH method significantly outperforms previous state-of-the-art methods on real-world image denoising task.

References

  • [1] C. Liu, R. Szeliski, S. Kang, C. L. Zitnick, and W. T. Freeman. Automatic estimation and removal of noise from a single image. IEEE TPAMI, 30(2):299–314, 2008.
  • [2] A. Buades, B. Coll, and J. M. Morel. A non-local algorithm for image denoising. In CVPR, pages 60–65, 2005.
  • [3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095, 2007.
  • [4] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminance-chrominance space. In ICIP, pages 313–316. IEEE, 2007.
  • [5] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian.

    Bm3d image denoising with shape-adaptive principal component analysis.

    In SPARS, 2009.
  • [6] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration. In ICCV, pages 2272–2279, 2009.
  • [7] W. Dong, L. Zhang, G. Shi, and X. Li. Nonlocally centralized sparse representation for image restoration. IEEE Transactions on Image Processing, 22(4):1620–1630, 2013.
  • [8] W. Dong, G. Shi, and X. Li. Nonlocal image restoration with bilateral variance estimation: A low-rank approach. IEEE Transactions on Image Processing, 22(2):700–711, 2013.
  • [9] H. Ji, C. Liu, Z. Shen, and Y. Xu. Robust video denoising using low rank matrix completion. In CVPR, pages 1791–1798. IEEE, 2010.
  • [10] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In CVPR, pages 2862–2869. IEEE, 2014.
  • [11] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based nonlocal self-similarity prior learning for image denoising. In ICCV, pages 244–252, 2015.
  • [12] J. Xu, D. Ren, L. Zhang, and D. Zhang. Patch group based bayesian learning for blind image denoising.

    Asian Conference on Computer Vision Workshop

    , pages 79–95, 2016.
  • [13] J. Xu, L. Zhang, D. Zhang, and X. Feng. Multi-channel weighted nuclear norm minimization for real color image denoising. In ICCV, 2017.
  • [14] N. Yair and T. Michaeli. Multi-scale weighted nuclear norm image restoration. In CVPR, pages 3165–3174, 2018.
  • [15] B. Wen, Y. Li, L. Pfister, and Y. Bresler.

    Joint adaptive sparsity and low-rankness on the fly: An online tensor reconstruction scheme for video denoising.

    In ICCV, pages 241–250, 2017.
  • [16] J. Xu, L. Zhang, and D. Zhang. A trilateral weighted sparse coding scheme for real-world image denoising. In ECCV, 2018.
  • [17] S. Lefkimmiatis. Non-local color image denoising with convolutional neural networks. In CVPR, pages 3587–3596, 2017.
  • [18] T. Plötz and S. Roth. Neural nearest neighbors networks. In NIPS, 2018.
  • [19] D. Liu, B. Wen, Y. Fan, C. C. Loy, and T. S. Huang. Non-local recurrent network for image restoration. In NIPS, pages 1680–1689, 2018.
  • [20] A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian. Practical poissonian-gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing, 17(10):1737–1754, 2008.
  • [21] P. Chatterjee and P. Milanfar. Clustering-based denoising with locally learned dictionaries. IEEE Transactions on Image Processing, 18(7):1438–1451, 2009.
  • [22] M. Elad and M. Aharon. Image denoising via learned dictionaries and sparse representation. In CVPR, volume 1, pages 895–900, 2006.
  • [23] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
  • [24] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
  • [25] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEE Transactions on Image Processing,, 17(1):53–69, 2008.
  • [26] I. Mosseri, M. Zontak, and M. Irani. Combining the power of internal and external denoising. In ICCP, pages 1–9, 2013.
  • [27] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
  • [28] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In ICCV, pages 479–486, 2011.
  • [29] H. Talebi and P. Milanfar. Global image denoising. IEEE Transactions on Image Processing, 23(2):755–768, 2014.
  • [30] J. Xu, L. Zhang, and D. Zhang. External prior guided internal prior learning for real-world noisy image denoising. IEEE Transactions on Image Processing, 27(6):2996–3010, 2018.
  • [31] F. Zhu, G. Chen, and P. A. Heng. From noise modeling to blind image denoising. In CVPR, pages 420–429, 2016.
  • [32] J. Chen, J. Chen, H. Chao, and M. Yang. Image blind denoising with generative adversarial network based noise modeling. In CVPR, pages 3155–3164, 2018.
  • [33] A. Pajot, E. Bezenac, and P. Gallinari. Unsupervised adversarial image reconstruction. In ICLR, 2019.
  • [34] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In CVPR, pages 2392–2399, 2012.
  • [35] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In CVPR, pages 2774–2781, June 2014.
  • [36] Y. Chen, W. Yu, and T. Pock. On learning optimized reaction diffusion processes for effective image restoration. In CVPR, pages 5261–5269, 2015.
  • [37] S. Nam, Y. Hwang, Y. Matsushita, and S. J. Kim. A holistic approach to cross-channel image noise modeling and its application to image denoising. In CVPR, pages 1683–1691, 2016.
  • [38] X. Mao, C. Shen, and Y. Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In NIPS, pages 2802–2810, 2016.
  • [39] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand. Deep joint demosaicking and denoising. ACM Transactions on Graphics, 35(6):191, 2016.
  • [40] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 2017.
  • [41] K. Zhang, W. Zuo, and L. Zhang. Ffdnet: Toward a fast and flexible solution for cnn based image denoising. IEEE Transactions on Image Processing, 2018.
  • [42] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. arXiv:1807.04686, 2018.
  • [43] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV, pages 4539–4547, 2017.
  • [44] S. Lefkimmiatis. Non-local color image denoising with convolutional neural networks. In CVPR, pages 3587–3596, 2017.
  • [45] S. Lefkimmiatis. Universal denoising networks: A novel cnn architecture for image denoising. In CVPR, 2018.
  • [46] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. In ICML, 2018.
  • [47] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In CVPR, pages 9446–9454, 2018.
  • [48] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In CVPR, 2018.
  • [49] B. Mildenhall, J. T. Barron, J. Chen, D. Sharlet, R. Ng, and R. Carroll. Burst denoising with kernel prediction networks. In CVPR, pages 2502–2510, 2018.
  • [50] X. Zhang, Y. Lu, J. Liu, and B. Dong. Dynamically unfolding recurrent restorer: A moving endpoint control method for image restoration. In ICLR, 2019.
  • [51] J. Xu, Y. Huang, L. Liu, F. Zhu, X. Hou, and L. Shao. Noisy-as-clean: Learning unsupervised denoising from the corrupted image, 2019.
  • [52] M. Zontak, I. Mosseri, and M. Irani. Separating signal from noise using patch recurrence across scales. In CVPR, 2013.
  • [53] A. A. Efros and T. K. Leung. Texture synthesis by non-parametric sampling. In ICCV, 1999.
  • [54] D. Glasner, S. Bagon, and M. Irani. Super-resolution from a single image. In ICCV, 2009.
  • [55] C. Barnes, E. Shechtman, A. Finkelstein, and D.B. Goldman. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3):24, 2009.
  • [56] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In CVPR, 2018.
  • [57] C. Liu, R. Szeliski, S. Bing Kang, C. L. Zitnick, and W. T. Freeman. Automatic estimation and removal of noise from a single image. IEEE TPAMI, 30(2):299–314, 2008.
  • [58] M. Lebrun, M. Colom, and J.-M. Morel. Multiscale image blind denoising. IEEE Transactions on Image Processing, 24(10):3149–3161, 2015.
  • [59] F. Zhu, G. Chen, and P.-A. Heng. From noise modeling to blind image denoising. In CVPR, June 2016.
  • [60] Neatlab ABSoft. Neat Image. https://ni.neatvideo.com/home.
  • [61] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
  • [62] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
  • [63] T. Plötz and S. Roth. Benchmarking denoising algorithms with real photographs. In CVPR, 2017.
  • [64] A. Abdelhamed, S. Lin, and M. S. Brown. A high-quality denoising dataset for smartphone cameras. In CVPR, June 2018.
  • [65] J. Xu, H. Li, Z. Liang, D. Zhang, and L. Zhang. Real-world noisy image denoising: A new benchmark. arXiv:1804.02603, 2018.
  • [66] J. Anaya and A. Barbu. RENOIR: A dataset for real low-light image noise reduction. JVCIR, 51:144 – 154, 2018.
  • [67] W. Sweldens. The lifting scheme: A custom-design construction of biorthogonal wavelets. Applied and Computational Harmonic Analysis, 3(2):186 – 200, 1996.
  • [68] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. Journal of Fourier Analysis and Applications, 4(3):247–269, 1998.
  • [69] N. Wiener.

    Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications.

    1949.
  • [70] A. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 69(3):331–371, Sep 1910.
  • [71] D. Zoran and Y. Weiss. Scale invariance and noise in natural images. In ICCV, pages 2209–2216, 2009.
  • [72] X. Liu, M. Tanaka, and M. Okutomi. Single-image noise level estimation for blind denoising. IEEE Transactions on Image Processing, 22(12):5226–5237, 2013.
  • [73] G. Chen, F. Zhu, and A. H. Pheng. An efficient statistical method for image noise level estimation. In ICCV, 2015.
  • [74] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE TPAMI, 39(6):1256–1272, 2017.
  • [75] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.