External Prior Guided Internal Prior Learning for Real Noisy Image Denoising

05/12/2017 ∙ by Jun Xu, et al. ∙ 0

Most of existing image denoising methods learn image priors from either external data or the noisy image itself to remove noise. However, priors learned from external data may not be adaptive to the image to be denoised, while priors learned from the given noisy image may not be accurate due to the interference of corrupted noise. Meanwhile, the noise in real-world noisy images is very complex, which is hard to be described by simple distributions such as Gaussian distribution, making real noisy image denoising a very challenging problem. We propose to exploit the information in both external data and the given noisy image, and develop an external prior guided internal prior learning method for real noisy image denoising. We first learn external priors from an independent set of clean natural images. With the aid of learned external priors, we then learn internal priors from the given noisy image to refine the prior model. The external and internal priors are formulated as a set of orthogonal dictionaries to efficiently reconstruct the desired image. Extensive experiments are performed on several real noisy image datasets. The proposed method demonstrates highly competitive denoising performance, outperforming state-of-the-art denoising methods including those designed for real noisy images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 6

page 7

page 8

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Image denoising is a crucial and indispensable step to improve image quality in digital imaging systems. In particular, with the decrease of size of CMOS/CCD sensors, image is more easily to be corrupted by noise and hence denoising is becoming increasingly important for high resolution imaging. The problem of image denoising has been extensively studied in literature and numerous image denoising methods [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44] have been proposed in the past decades. Most of existing denoising methods focus on the scenario of additive white Gaussian noise (AWGN) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 15, 16, 20, 21, 22, 23, 24, 25], where the observed noisy image is modeled as the addition of clean image and AWGN , i.e., . There are also methods proposed for removing Poisson noise [26, 27], mixed Poisson and Gaussian noise [28, 29, 30, 31], mixed Gaussian and impulse noise [32, 33, 34], and realistic noise in real photography [35, 36, 37, 38, 41, 39, 40, 42, 44, 43].

Natural images have many properties, such as sparsity and nonlocal self-similarity, which can be employed as useful priors for designing image denoising methods. Based on the facts that natural images will be sparsely distributed in some transformed domain, wavelet [1] and curvelet [2] transforms have been widely adopted for image denoising. The sparse representation based methods [3, 4, 5, 6, 7, 8] encode image patches over a dictionary by using -norm minimization to enforce the sparsity. The well-known bilateral filters [9] employ the prior information that image pixels exhibit similarity in both spatial domain and intensity domain. Other image priors such as multiscale self-similarity [10] and nonlocal self-similarity [11, 12], or the combination of multiple image priors [13, 14] have also been successfully used in image denoising. For example, by using low-rank minimization to characterize the image nonlocal self-similarity, the WNNM [13] method achieves state-of-the-art performance for AWGN denoising.

Instead of using predefined image priors, methods have also been proposed to learn priors from natural images for denoising. The generative image prior learning methods usually learn prior models from a set of external clean images and apply the learned prior models to the given noisy image [15, 16, 14, 17, 18, 19], or learn priors from the given noisy image to perform denoising [3]. Recently, the discriminative image prior learning methods [20, 21, 22, 25, 23, 24]

, which learn denoising models from pairs of clean and noisy images, have been becoming popular. The representative methods include the neural network based methods

[20, 21, 22], random fields based methods [23, 24], and reaction diffusion based methods [25].

(a) Noisy [42]: 33.30dB (b) CBM3D [7]: 34.55dB (c) WNNM [13]: 35.85dB (d) CSF [24]: 35.39dB (e) TNRD [25]: 35.97dB
(f) DnCNN [22]: 34.14dB (g) NI [44]: 34.39dB (h) NC [39, 40]: 35.33dB (i) Ours: 37.49dB (j) Mean Image [42]
Figure 1: Denoised images of a region cropped from the real-world noisy image “Nikon D800 ISO 3200 A3” [42] by different methods. The scene was shot 500 times with the same camera and camera setting. The mean image of the 500 shots is roughly taken as the “ground truth”, with which the PSNR can be computed. The images are better viewed by zooming in on screen.

Most of the above mentioned methods focus on AWGN removal, however, the assumption of AWGN is too ideal to be true for real-world noisy images, where the noise is much more complex and varies with different scenes, cameras and camera settings (ISO, shutter speed, and aperture, etc.) [42, 45]. As a result, many denoising methods in literature, including those learning based methods, become less effective when applied to real-world noisy images. Fig. 1 shows an example, where we apply some representative and state-of-the-art denoising methods, including CBM3D [7], WNNM [13], DnCNN [22], CSF [24], and TNRD [25] to a real-world noisy image (captured by a Nikon D800 camera with ISO is 3200) provided in [42]. One can see that these methods either remain much the noise or over-smooth the image details.

There have been a few methods [35, 36, 37, 38, 42, 41, 39, 40, 43] and software toolboxes [44]

developed for real-world noisy image denoising. Almost all of these methods follow a two-stage framework: first estimate the parameters of the noise model (usually assumed to be Gaussian or mixture of Gaussians (MoG)), and then perform denoising with the estimated noise model. However, the noise in real-world noisy images is very complex and is hard to be modeled by explicit distributions such as Gaussian and MoG. According to

[45], the noise corrupted in the in-camera imaging process [46, 47, 42, 48] is signal dependent and comes from five main sources: photon shot, fixed pattern, dark current, readout, and quantization noise. The existing methods [35, 36, 37, 38, 42, 41, 39, 40, 44] mentioned above may not perform well on real-world noisy image denoising tasks. Fig. 1 also shows the denoising results of two real-world noisy image denoising methods, Noise Clinic [39, 40] and Neat Image [44]. One can see that these two methods still generate much noise caused artifacts.

This work aims to develop a new paradigm for real-world noisy image denoising. Different from existing real-world noisy image denoising methods [35, 36, 37, 38, 42, 41, 39, 40] which focus on noise modeling, we focus on image prior learning. We argue that with a strong and adaptive prior learning scheme, robust denoising performance on real-world noisy images can still be obtained. To achieve this goal, we propose to first learn image priors from external clean images, and then employ the learned external priors to guide the learning of internal priors from the given noisy image. The flowchart of the proposed method is illustrated in Fig. 2

. We first extract millions of patch groups (PGs) from a set of high quality natural images, with which a Gaussian Mixture Model (GMM) is learned as the external image prior. The learned GMM prior model is used to assign each PG extracted from the given noisy image into its most suitable cluster via maximum a-posterior, and then an external-internal hybrid orthogonal dictionary is learned as the final prior for each cluster, with which the denoising can be readily performed by weighted sparse coding with closed form solution. The external priors learned from clean images preserve fine-scale image structural information, which is hard to be reproduced from noisy images. Therefore, external dictionary can serve as a good supplement to the internal dictionary. Our proposed denoising method is simple and efficient, yet our extensive experiments on real-world noisy images demonstrate its better denoising performance than the current state-of-the-arts.

Figure 2: Flowchart of the proposed external prior guided internal prior learning and denoising framework.

Ii Related Work

Ii-a Internal and External Prior Learning

Learning natural image priors plays a key role in image denoising [10, 8, 3, 4, 5, 15, 16, 14, 17, 18, 19, 20, 21, 22, 23, 24, 25]. There are mainly four categories of prior learning based methods. 1) External prior learning methods [15, 16, 14] learn priors (e.g., dictionaries) from a set of external clean images, and the learned priors are used to recover the latent clean image from the given noisy image. 2) Internal prior learning methods [10, 8, 3, 4, 5] directly learn priors from a given noisy image, and image denoising is often done simultaneously with the prior learning process. 3) Discriminative prior learning methods [20, 21, 22, 23, 24, 25] learn discriminative models or mapping functions from clean and noisy image pairs, and the learned models or mapping functions are applied to a noisy image for denoising. 4) Hybrid methods [17, 18, 19] combine the external and internal priors to denoise the given input image.

It has been shown [15, 16, 14] that the external priors learned from natural clean images are effective and efficient for universal image denoising problems, whereas they are not adaptive to the given noisy image and some fine-scale image structures may not be well recovered. By contrast, the internal priors learned from the given noisy image are adaptive to image content, but the learned priors can be much affected by noise and the learning processing is usually slow [10, 8, 3, 4, 5]. Besides, most of the internal prior learning methods [10, 8, 3, 4, 5] assume additive white Gaussian noise (AWGN), making the learned priors less robust for real-world noisy images. In this paper, we use external priors to guide the internal prior learning. Our method is not only much faster than the traditional internal learning methods, but also very robust to denoise real-world noisy images.

In [17], the authors employed external clean patches to denoise noisy patches with high individual Signal-to-Noise-Ratio (PatchSNR), and employed internal noisy patches to denoise noisy patches with low PatchSNR. This is essentially different from our work which employs the external patch group based prior to guide the clustering and dictionary learning of the internal noisy patch groups. In [18], the external priors are only used to guide the internal patch clustering for image denoising, while in our work, the learned external priors are employed to guide not only the internal clustering, but also the internal dictionary learning. Besides, the method of [18] follows a patch based framework for AWGN removal, while in our work we employ a patch group based framework for real-world noisy image denoising. In addition, some technical details are also different. For example, method in [18] utilizes low-rank minimization for denoising, while we use dictionary learning and sparse coding for denoising. In the Targeted Image Denoising (TID) method [19], targeted images are selected from a large dataset for each patch in the input noisy image for denoising, which is computationally expensive.

Ii-B Real-World Noisy Image Denoising

Most of the denoising methods in literature [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 22, 23, 24, 25] assume AWGN noise and use simulated noisy images for algorithm design and evaluation. Recently, several denoising methods have been proposed to remove unknown noise from real-world noisy images [35, 36, 37, 38, 42, 41, 39, 40]. Portilla [35] employed a correlated Gaussian model to estimate the noise of each wavelet subband. Rabie [36]

modeled the noisy pixels as outliers and performed denoising via Lorentzian robust estimator. Liu et al.

[37] proposed the “noise level function” to estimate the noise and performed denoising by learning a Gaussian conditional random field. Gong et al. [38] proposed to model the data fitting term via weighted sum of and norms and performed denoising by a simple sparsity regularization term in the wavelet transform domain. The “Noise Clinic” [39, 40] estimates the noise distribution by using a multivariate Gaussian model and removes the noise by using a generalized version of nonlocal Bayesian model [12]. Zhu et al. [41] proposed a Bayesian method to approximate and remove the noise via a low-rank mixture of Gaussians (MoG) model. The method in [42] models the cross-channel noise in real-world noisy image as a multivariate Gaussian and the noise is removed by the Bayesian nonlocal means filter [49]. The commercial software Neat Image [44] estimates the noise parameters from a flat region of the given noisy image and filters the noise correspondingly.

The methods [35, 36, 37, 38, 42, 41, 39, 40] emphasize much on the noise modeling, and they use Gaussian or MoG to model the noise in real-world noisy images. Nonetheless, the noise in real-world noisy images is very complex and hard to be modeled by explicit distributions [45]. These works ignore the importance of learning image priors, which actually can be easier to model compared with modeling the complex realistic noise. In this paper, we propose a simple yet effective image prior learning method for real-world noisy image denoising. Due to its strong prior modeling ability, the proposed method simply models the noise as locally Gaussian, and it achieves highly competitive performance on real-world noisy image denoising.

Iii External Prior Guided Internal Prior Learning for Image Denoising

In this section, we first describe the learning of external prior, and then describe in detail the guided internal prior learning method, followed by the denoising algorithm.

Iii-a Learn External Patch Group Priors

The nonlocal self-similarity based patch group (PG) prior learning [14] has proved to be very effective for image denosing. In this work, we extract PGs from natural clean images to learn external priors. A PG is a group of similar patches to a local patch. In our method, each local patch is extracted from a RGB image with patch size . We search the most similar (i.e., smallest Euclidean distance) patches to this local patch (including the local patch itself) in a

region around it. Each patch is stretched to a patch vector

to form the PG, denoted by . The mean vector of this PG is , and the group mean subtracted PG is defined as .

Assume that a number of PGs are extracted from a set of external natural images, and the -th PG is . A Gaussian Mixture Model (GMM) is learned to model the PG prior. The overall log-likelihood function is

(1)

The learning process is similar to the GMM learning in [14, 16, 50]. Finally, a GMM model with Gaussian components is learned, and the learned parameters include mixture weights , mean vectors , and covariance matrices . Note that the mean vector of each cluster is naturally zero, i.e., .

To better describe the subspace of each Gaussian component, we perform singular value decomposition (SVD)

[51] on the covariance matrix:

(2)

The eigenvector matrices

will be employed as the external orthogonal dictionary to guide the internal sub-dictionary learning in next sub-section. The singular values in reflect the significance of the singular vectors in . They will also be utilized as prior weights for weighted sparse coding in our denoising algorithm.

Iii-B Guided Internal Prior Learning

After the external PG prior model is learned from external natural clean images, we employ it to guide the internal PG prior learning for a given real-world noisy image. The guidance lies in two aspects. First, the external prior will guide the subspace clustering [52, 53] of internal noisy PGs. Second, the external prior will guide the orthogonal dictionary learning of internal noisy PGs.

Iii-B1 Internal Subspace Clustering

Given a real-world noisy image , we extract (overlapped) local patches from it. Similar to the external prior learning stage, for the -th () local patch we search its most similar (by Euclidean distance) patches around it to form a noisy PG, denoted by . Then the group mean of , denoted by , is subtracted from each patch by , leading to the mean subtracted noisy PG .

The external GMM prior models basically characterize the subspaces of natural high quality PGs. Therefore, we project each noisy PG into the subspaces of

and assign it to the most suitable subspace based on the posterior probability:

(3)

for . Then is assigned to the subspace with the maximum a-posteriori (MAP) probability .

Iii-B2 Guided Orthogonal Dictionary Learning

Assume that we have assigned all the internal noisy PGs to their corresponding most suitable subspaces in . For the -th subspace, the noisy PGs assigned to it are , where and . We propose to learn an orthogonal dictionary from each set of PGs to characterize the internal PG prior with the guidance of the corresponding external orthogonal dictionary (Eq. (2)). The reasons that we learn orthogonal dictionaries are two-fold. Firstly, the PGs are in a subspace of the whole space of all PGs; therefore, there is no necessary to learn a redundant over-complete dictionary to characterize it, while an orthonormal dictionary has naturally zero mutual incoherence [54]. Secondly, the orthogonality of dictionary can make the patch encoding in the testing stage very efficient, leading to an efficient denoising algorithm (please refer to sub-section III-C for more details).

We let the orthogonal dictionary be

(4)

where is the external sub-dictionary and it includes the first most important eigenvectors of , and the internal sub-dictionary is to be adaptively learned from the noisy PGs . The rationale to design as a hybrid dictionary is as follows. The external sub-dictionary is pre-trained from external clean data, and it represents the -th latent subspace of natural images, which is helpful to reconstruct the common latent structures of images. However, is general to all images but not adaptive to the given noisy image. Some fine-scale details specific to the given image may not be well characterized by . Therefore, we learn an internal sub-dictionary to supplement . In other words, is to reveal the latent subspace adaptive to the input noisy image, which cannot be effectively represented by .

For notation simplicity, in the following development we ignore the subspace index for and , etc. The learning of hybrid orthogonal dictionary is performed under the following weighted sparse coding framework:

(5)

where is the

dimensional identity matrix,

is the sparse coding vector of the -th patch in the -th PG and is the -th element of . is the -th regularization parameter defined as

(6)

where is the -th singular value of diagonal singular value matrix (please refer to Eq. (2)) and is a small positive number to avoid zero denominator. Note that if and if .

In the dictionary learning model (5), we use the norm to model the representation residual of PGs. This is because the patches in those PGs have similar content, and we assume that the noise therein will have similar statistics, which can be roughly modeled as locally Gaussian. On the other hand, this will make the dictionary learning much easier to solve. We employ an alternating iterative approach to solve the optimization problem (5). Specifically, we initialize the orthogonal dictionary as and for , and alternatively update and as follows.

Updating Sparse Coding Coefficients: Given the orthogonal dictionary , we update each sparse coding vector by solving

(7)

Since dictionary is orthogonal, the problems (7) has a closed-form solution

(8)

where is the vector of regularization parameter, is the sign function and means element-wise multiplication. The detailed derivation of Eq. (8) can be found in Appendix A.

Updating Internal Sub-dictionary: Given the sparse coding vectors , we update the internal sub-dictionary by solving

(9)

where and is the dimensional identity matrix. The sparse coefficients matrix can be written as where the external part and the internal part represent the coding coefficients of over external sub-dictionary and internal sub-dictionary , respectively. According to the following Theorem 1, by setting , the problem (9) has a closed-form solution , where and are the orthogonal matrices obtained by the following SVD [51]

(10)

The orthogonality of internal sub-dictionary can be checked by . In fact, the Theorem 1 provides a sufficient and necessary condition to guarantee the existence of the closed-form solution for the internal sub-dictionary of the problem (9).

Theorem 1.

Let , be two given data matrices. is a given matrix satisfying , then is the necessary condition of

(11)

where and are the orthogonal matrices obtained by performing economy (a.k.a. reduced) SVD [51]:

(12)

Besides, if , is also the sufficient condition of problem (11).

The proof of the Theorem 1 can be found in Appendix B. Though the problem (9) has a closed-form solution by SVD [51], the uniqueness of solution cannot be guaranteed since the matrices as well as and may be reduced to matrices of lower rank. Hence, we also analyze the uniqueness of the solution by the following Theorem 2, whose proof can be found in Appendix C.

Theorem 2.

(a) If is nonsingular, i.e., , then the solution of is unique; (b) If is singular, i.e., , then the number of possible solutions of is for fixed and .

The above alternative updating steps are repeated until the number of iterations exceeds a preset threshold. In each step, the energy value of the objective function (5) is decreased and we empirically found that the proposed model usually converges in 10 iterations. We summarize the procedures in Algorithm 1.

Algorithm 1: External Prior Guided Internal Prior Learning
Input: Matrices , external sub-dictionary , parameter vector
Initialization: initialize by Eq. (2);
for do
1. Update by Eq. (7);
2. Update by Eq. (9);
end for
Output: Internal orthogonal dictionary and sparse codes .

Iii-C The Denoising Algorithm

The denoising of the given noisy image can be simultaneously done with the guided internal sub-dictionary learning process. Once we obtain the solutions of sparse coding vectors in Eq. (8) and the orthogonal dictionary in Eq. (9), the latent clean patch of the -th noisy patch in PG is reconstructed as

(13)

where is the group mean of . The latent clean image is then reconstructed by aggregating all the reconstructed patches in all PGs. We perform the above denoising procedures for several iterations for better denoising outputs. The proposed denoising algorithm is summarized in Algorithm 2.

Algorithm 2: External Prior Guided Internal Prior Learning for
      Real-World Noisy Image Denoising
Input: Noisy image , external PG prior GMM model
Initialization: ;
for do
1. Extracting internal PGs from ;
Guided Internal Subspace Clustering:
for each PG do
2. Calculate group mean and form mean subtracted PG ;
3. Subspace clustering via Eq. (3);
end for
Guided Internal Orthogonal Dictionary Learning:
for the PGs in each subspace do
4. External PG prior guided internal orthogonal dictionary learning by
    solving Eq. (5);
5. Recover each patch in all PGs via Eq. (13);
end for
6. Aggregate the recovered PGs of all subspaces to form the recovered
 image ;
end for
Output: The denoised image .

Iv Experiments

Iv-a Implementation Details

The noise in real-world images is very complex due to the many factors such as sensors, lighting conditions and camera settings. It is difficult to evaluate one algorithm by tuning its parameters for all these different settings. In this work, we fix the parameters of our algorithm and apply it to all the testing datasets, though they were captured by different types of sensors and under different camera settings. The parameters of our method include the patch size , the number of similar patches in a patch group (PG), the window size for PG searching, the number of Gaussian components in GMM, the number of atoms in the external sub-dictionaries, the sparse regularization parameter , the iteration numbers for solving problem (5) and for Alg. 2.

The performance of our proposed method varies little when we set patch size between and , and we fix the patch size as to save computational cost. The search window is fixed to to balance computational cost and denoising accuracy of the proposed method. The number of patches in a patch group is set as , while using more patches will not bring clear benefits. We learn the external GMM prior with 3.6 million PGs extracted from the Kodak PhotoCD Dataset (http://r0k.us/graphics/kodak/), which includes 24 high quality color images. The number of Gaussians in GMM is set as , while using more Gaussians can only bring slightly better performance but cost more computational resources. The number of atoms in the external sub-dictionaries affects little the performance when it is set between and , and we set it as to make the external and internal sub-dictionaries have the same number of atoms. We set the number of iterations as for solving the problem (5), while the number of iterations for Alg. 2 is set as .

One key parameter of our model is the regularization parameter . Fig. 3 plots the curves of PSNR/SSIM results w.r.t on the 15 cropped image in dataset [42]. One can see that our proposed method achieves good PSNR/SSIM performance within a certain range of . Similar observations can be made on other datasets. We fix in the paper, and it works well across the three datasets used in our experiments.

All the parameters of our method are fixed in all experiments, which are run under the Matlab2014b environment on a machine with Intel(R) Core(TM) i7-5930K CPU of 3.5GHz and 32GB RAM. We will release the code with the publication of this work.

Figure 3: The influence of parameter on the average PSNR (dB)/SSIM results of the proposed method on dataset [42].

Iv-B The Testing Datasets

We evaluate the proposed method on three real-world noisy image datasets, where the images were captured under indoor or outdoor lighting conditions by different types of cameras and camera settings.

Dataset 1. The first dataset is provided in [42], which includes noisy images of 11 static scenes. The noisy images were collected under controlled indoor environment. Each scene was shot 500 times under the same camera and camera setting. The mean image of the 500 shots is roughly taken as the “ground truth”, with which the PSNR and SSIM [55]can be computed.

Since the image size is very large (about ) and the 11 scenes share repetitive contents, the authors of [42] cropped 15 smaller images (of size ) to perform experiments. In order to evaluate the proposed methods more comprehensively, we cropped 60 images of size from the dataset for experiments. Some samples are shown in Fig. 4. Note that our cropped 60 images and the 15 cropped images by the authors of [42] are from different shots.

Dataset 2 is called the Darmstadt Noise Dataset (DND) [56], which includes 50 different pairs of images of the same scenes captured by Sony A7R, Olympus E-M10, Sony RX100 IV, and Huawei Nexus 6P. The real-world noisy images are collected under higher ISO values with shorter exposure time, while the “ground truth” images are captured under lower ISO values with longer exposure times. Since the captured images are of megapixel-size, the authors cropped 20 bounding boxes of pixels from each image in the dataset, yielding test crops in total. Some samples are shown in Fig. 5. Note that the “ground truth” images of this dataset have not been released yet, but one can submit the denoised images to the Project Website and get the average PSNR (dB) and SSIM results.

Dataset 3. On one hand, the scenes of Dataset 1 are mostly printed photos, and they cannot represent realistic objects and scenes with different reflectance properties. On the other hand, the Dataset 2 contains repetitive contents in the 20 cropped images for each of the 50 scenes. To remedy the limitations of Dataset 1 and Dataset 2, we construct another dataset which contains images of 10 different scenes captured by Canon 80D and Sony A7II cameras with more ISO settings and more comprehensive scenes. The ISO settings in our dataset are 800, 1600, 3200, 6400, 12800 while those of Dataset 1 are 1600, 3200, 6400. Compared to Dataset 2, our new dataset is more comprehensive on scene contents. Similar to Dataset 1, each scene was captured 500 shots, and the mean image of these 500 shots can be used a kind of ground-truth to evaluate the denoising algorithms. Fig. 6 shows some cropped images of the scenes in our dataset. One can see that the images contain a lot of different realistic objects with varying colors, shapes, materials, etc.

Our dataset provides real-world noisy images of realistic objects with different ISO settings. It can be used to more fairly evaluate the performance of different real-world noisy image denoising methods. Consider that the image resolution is very high (about ), for the convenience of experimental studies, we cropped 100 (10 for each scene) smaller images (of size ) from it to perform experiments. The whole dataset will be made publically available with the publication of this paper.

Figure 4: Some sample images from the Dataset 1 [42].
Figure 5: Some sample images from the Dataset 2 [56].
Figure 6: Some sample images from our dataset (Dataset 3).
(a) Noisy [42]: 35.89dB (b) External: 39.05dB (c) Internal: 38.75dB (d) Guided Internal: 39.39dB (e) Mean Image [42]
Figure 7: Denoised images of a region cropped from the real-world noisy image “Nikon D600 ISO 3200 C1” [42] by different methods. The images are better to be zoomed-in on screen.
(a) Noisy [42]: 33.77dB (b) External: 36.97dB (c) Internal: 37.40dB (d) Guided Internal: 38.01dB (e) Mean Image [42]
Figure 8: Denoised images of a region cropped from the real-world noisy image “Nikon D600 ISO 3200 C1” [42] by different methods. The images are better to be zoomed-in on screen.

Iv-C Comparison among external, internal and guided internal priors

To demonstrate the advantages of external prior guided internal prior learning, we perform real-world noisy image denoising by using external priors only (denoted by “External”), internal priors only (denoted by “Internal”), and the proposed guided internal priors (denoted by “Guided Internal”), respectively. For the “External” method, we utilize the full external dictionaries (i.e., in Eq. (5)) for denoising. For the “Internal” method, the overall framework is similar to the method of [5]. A GMM model (with Gaussians) is directly learned from the PGs extracted from the given noisy image without using any external data, and then the internal orthogonal dictionaries are obtained via Eq. (2) to perform denoising. All parameters of the “External” and “Internal” methods are tuned to achieve their best performance.

We compare the three methods on the 60 cropped images from Dataset 1 [42]. The average PSNR and run time are listed in Table I. The best results are highlighted in bold. It can be seen that “Guided Internal” method achieves better PSNR than both “External” and “Internal” methods. In addition, the “Internal” method is very slow because it involves online GMM learning, while the “Guided Internal” method is only a little slower than the “External” method. Figs. 7 and 8 show the denoised images of two noisy images by the three methods. One can see that the “External” method is good at recovering large-scale structures (see Fig. 7) while the “Internal” method is good at recovering fine-scale textures (see Fig. 8). By utilizing external priors to guide the internal prior learning, our proposed method can effectively recover both the large-scale structures and fine-scale textures.

Noisy External Internal Guided Internal
PSNR 34.51 38.21 38.07 38.75
Time 21.19 312.67 22.26
Table I: Average PSNR (dB) and Run Time (seconds) of the “External”, “Internal”, and “Guided Internal” methods on 60 real-world noisy images (of size ) cropped from Dataset 1 [42].

Iv-D Comparison with State-of-the-Art Denoising Methods

Comparison methods. We compare the proposed method with state-of-the-art image denoising methods, including GAT-BM3D [30], CBM3D [7], WNNM [13], TID [19], MLP [20], DnCNN [22], CSF [24], TNRD [25], Noise Clinic (NC) [39, 40], Cross-Channel (CC) [42], and Neat Image (NI) [44]. Among these methods, GAT-BM3D [30] is a state-of-the-art Poisson noise reduction method. The method CBM3D [7] is a state-of-the-art method for color image denoising and the noise on color images is assumed to be additive white Gaussian. The methods of WNNM, MLP, DnCNN, CSF, and TNRD are state-of-the-art Gaussian noise removal methods for grayscale images, and we apply them to each channel of color images for denoising. NC is a blind image denoising method, and NI is a set of commercial software for image denoising, which has been embedded into Photoshop and Corel PaintShop. The code of CC is not released but its results on the 15 cropped images are available at [42]. Therefore, we only compare with it on the 15 cropped images in Dataset 1 [42].

Noise level of comparison methods.

For the CBM3D method, the standard deviation of noise on color images should be given as a parameter. For methods of WNNM, MLP, CSF, and TNRD, the noise level in each color channel should be input. For the DnCNN method, it is trained to deal with noise in a range of levels

. We retrain the models of discriminative denoising methods MLP, CSF, and TNRD (using the released codes by the authors) at different noise levels from to with a gap of . The denoising is performed by processing each channel with the model trained at the same (or nearest) noise level. The noise levels () in R, G ,B channels are assumed to be Gaussian and can be estimated via some noise estimation methods [57, 58]. In this paper, we employ the method [58] to estimate the noise level for each color channel.

Setting GAT-BM3D CBM3D WNNM TID MLP CSF TNRD DnCNN NI NC CC Ours
Canon 5D 31.23 39.76 37.51 37.22 39.00 35.68 39.51 37.26 37.68 38.76 38.37 40.50
2-13 ISO = 3200 30.55 36.40 33.86 34.54 36.34 34.03 36.47 34.13 34.87 35.69 35.37 37.05
2-13 27.74 36.37 31.43 34.25 36.33 32.63 36.45 34.09 34.77 35.54 34.91 36.11
Nikon D600 28.55 34.18 33.46 32.99 34.70 31.78 34.79 33.62 34.12 35.57 34.98 34.88
2-13 ISO = 3200 32.01 35.07 36.09 34.20 36.20 35.16 36.37 34.48 35.36 36.70 35.95 36.31
2-13 39.78 37.13 39.86 35.58 39.33 39.98 39.49 35.41 38.68 39.28 41.15 39.23
Nikon D800 32.24 36.81 36.35 34.94 37.95 34.84 38.11 35.79 37.34 38.01 37.99 38.40
2-13 ISO = 1600 33.86 37.76 39.99 35.19 40.23 38.42 40.52 36.08 38.57 39.05 40.36 40.92
2-13 33.90 37.51 37.15 35.26 37.94 35.79 38.17 35.48 37.87 38.20 38.30 38.97
Nikon D800 36.49 35.05 38.60 33.70 37.55 38.36 37.69 34.08 36.95 38.07 39.01 38.66
2-13 ISO = 3200 32.91 34.07 36.04 31.04 35.91 35.53 35.90 33.70 35.09 35.72 36.75 37.07
2-13 40.20 34.42 39.73 33.07 38.15 40.05 38.21 33.31 36.91 36.76 39.06 38.52
Nikon D800 29.84 31.13 33.29 29.40 32.69 34.08 32.81 29.83 31.28 33.49 34.61 33.76
2-13 ISO = 6400 27.94 31.22 31.16 29.86 32.33 32.13 32.33 30.55 31.38 32.79 33.21 33.43
2-13 29.15 30.97 31.98 29.21 32.29 31.52 32.29 30.09 31.40 32.86 33.22 33.58
Average 32.43 35.19 35.77 33.36 36.46 35.33 36.61 33.86 35.49 36.43 36.88 37.15
Time (s) 10.9 6.9 151.5 7353.2 16.8 19.3 5.1 79.2 0.6 15.3 NA 23.9
Table II: PSNR(dB) results and Speed (sec.) of different methods on 15 cropped real-world noisy images used in [42].

Results on Dataset 1. As described in section 4.2, there is a mean image for each of the 11 scenes used in Dataset 1 [42], and those mean images can be roughly taken as “ground truth” images for quantitative evaluation of denoising algorithms. We firstly perform quantitative comparison on the 15 cropped images used in [42]. The results on PSNR (dB) and speed (second) of GAT-BM3D, CBM3D, WNNM, TID, MLP, CSF, TNRD, DnCNN, NC, NI and CC are listed in Table II (The results of CC are copied from the original paper [42]). The best PSNR results of each image are highlighted in bold. One can see that on 8 out of the 15 images, our method achieves the best PSNR values. CC achieves the best PSNR on 3 of the 15 images. It should be noted that in the CC method, a specific model is trained for each camera and camera setting, while our method uses the same model for all images. On average, our proposed method has 0.27dB PSNR improvements over the second best method CC and much higher PSNR gains over other competing methods. The method GAT-BM3D does not work well on most images. This is because real world noise is much more complex than Poisson.

Figs. 9 and 10 show the denoised images of one scene captured by Canon 5D Mark 3 at ISO = 3200 and Nikon D800 at ISO = 6400, respectively. We can see that GAT-BM3D, CBM3D, TID, DnCNN, NC, NI and CC would either remain noise or generate artifacts, while TNRD over-smooths much the image. By using the external prior guided internal priors, our proposed method preserves edges and textures better than other methods while removing the noise, leading to visually more pleasant outputs. Specifically, Fig. 10 is used to illustrate the denoising performance of our method on fine-scale textures such as hair, which is a very challenging task. Even the “ground truth” mean image cannot show very clear details of the hair. Though our method cannot reproduce clearly the details (e.g., the local direction of hair in some regions), it demonstrates the best visual results among the competing methods. More comparisons on visual quality and SSIM [55] index can be found in the supplementary file.

(a) Noisy [42]: 37.00dB (b) CBM3D [7]: 39.76dB (c) TID [19]: 37.22dB (d) TNRD [25]: 39.51dB (e) DnCNN [22]: 37.26dB
(f) NI [44]: 37.68dB (g) NC [39, 40]: 38.76dB (h) CC [42]: 38.37dB (i) Ours: 40.50dB (j) Mean Image [42]
Figure 9: Denoised images of a region cropped from the real-world noisy image “Canon 5D Mark 3 ISO 3200 1” [42] by different methods. The images are better to be zoomed-in on screen.
(a) Noisy [42]: 37.00dB (b) CBM3D [7]: 39.76dB (c) TID [19]: 37.51dB (d) TNRD [25]: 39.51dB (e) DnCNN [22]: 37.26dB
(f) NI [44]: 37.68dB (g) NC [39, 40]: 38.76dB (h) CC [42]: 38.37dB (i) Ours: 40.50dB (j) Mean Image [42]
Figure 10: Denoised images of a region cropped from the real-world noisy image “Nikon D800 ISO 6400 1” [42] by different methods. The images are better to be zoomed-in on screen.

We then perform denoising experiments on the 60 images we cropped from [42]. The average PSNR results are listed in Table III (CC is not compared since the code is not available). Again, our proposed method achieves much better PSNR results than the other methods. The improvements of our method over the second best method (TNRD) are 0.43dB on PSNR. Fig. 11 shows the denoised images of one scene captured by Nikon D800 at ISO = 3200. We can see again that the proposed method obtain better visual quality than other competing methods. More comparisons on visual quality and SSIM can be found in the supplementary file.

(a) Noisy [42]: 33.60dB (b) CBM3D [7]: 35.23dB (c) WNNM [13]: 36.50dB (d) CSF [24]: 36.21dB (e) TNRD [25]: 37.10dB
(f) DnCNN [22]: 34.43dB (g) NI [44]: 35.02dB (h) NC [40, 39]: 36.07dB (i) Ours: 37.50dB (j) Mean Image [42]
Figure 11: Denoised images of a region cropped from the real-world noisy image “Nikon D800 ISO 3200 A3” [42] by different methods. The images are better viewed by zooming in on screen.

Results on Dataset 2. In Table IV, we list the average PSNR (dB) results of the competing methods on the 1000 cropped images in the DND dataset [56]. We can see again that the proposed method achieves better performance than the other competing methods. Note that the “ground truth” images of this dataset have not been released yet, so we are not able to calculate the PSNR and SSIM results for each noisy image in this dataset, nor compare with the “ground truth” mean image. However, one can submit the denoised images to the project website and get the average PSNR and SSIM results on the whole 1000 images. Fig. 12 shows the denoised images of a scene “0001_2” captured by a Nexus 6P phone [56]. The noise level in this image is relatively high. Hence, this image can be used to justify the performance of the proposed method on real-world noisy images with lower PSNR (around 20dB). One can see that the proposed method achieves visually more pleasing results than the other denoising methods. More comparisons on visual quality and SSIM can be found in the supplementary file.

(a) Noisy [40] (b) CBM3D [7] (c) WNNM [13] (d) MLP [20] (e) CSF [24]
(f) TNRD [25] (g) DnCNN [22] (h) NI [44] (i) NC [39, 40] (j) Ours
Figure 12: Denoised images by different methods of the real-world noisy image “0001_2” captured by a Huawei Nexus 6P phone[56]. Note that the ground-truth clean image of the noisy input is not publicly released yet.

Results on Dataset 3. Similar to Dataset 1 [42], there is a “ground truth” image for each of the 10 scenes used in our constructed Dataset 3. We perform quantitative comparison on the 100 cropped images. The average PSNR results of competing methods are listed in Table IV. We can see that our proposed method achieves much better PSNR results than the other methods. The improvements of our method over the second best method (TNRD) is 0.16dB on PSNR. Fig. 13 shows the denoised images of one scene captured by Canon 80D at ISO = 12800. We can see again that the proposed method removes the noise while maintains better details (such as the vertical black shadow area) than other competing methods. More comparisons on visual quality and SSIM can be found in the supplementary file.

Comparison on speed. Efficiency is an important aspect to evaluate the efficiency of algorithms. We compare the speed of all competing methods except for CC. All experiments are run under the Matlab2014b environment on a machine with Intel(R) Core(TM) i7-5930K CPU of 3.5GHz and 32GB RAM. The average running time (second) of the compared methods on the 100 real-world noisy images is shown in Table V. The least average running time are highlighted in bold. One can easily see that the commercial software Neat Image (NI) is the fastest method with highly optimized code. For a image, NI costs about 0.6 second. The other methods cost from 5.2 (TNRD) to 152.2 (WNNM) seconds, while the proposed method costs about 24.1 seconds. It should be noted that GAT-BM3D, CBM3D, TNRD, and NC are implemented with compiled C++ mex-function and with parallelization, while WNNM, TID, MLP, CSF, DnCNN, and the proposed method are implemented purely in Matlab.

Methods GAT-BM3D CBM3D WNNM MLP CSF TNRD DnCNN NI NC Ours
PSNR 34.33 36.34 37.67 38.13 37.40 38.32 34.99 36.53 37.57 38.75
Table III: Average PSNR(dB) results of different methods on 60 real-world noisy images cropped from [42].
Methods GAT-BM3D CBM3D WNNM MLP CSF TNRD DnCNN NI NC Ours
PSNR 30.07 32.14 33.28 34.02 33.87 34.15 32.41 35.11 36.07 36.41
Table IV: Average PSNR(dB) results of different methods on the 1000 real-world noisy images from the DND dataset [56].
Methods GAT-BM3D CBM3D WNNM MLP CSF TNRD DnCNN NI NC Ours
PSNR 33.54 37.14 35.18 37.34 37.07 37.48 34.74 35.70 36.76 37.64
Table V: Average PSNR(dB) results of different methods on 100 real-world noisy images cropped from our new dataset.
Methods GAT-BM3D CBM3D WNNM MLP CSF TNRD DnCNN NI NC Ours
Time 11.1 6.9 152.2 17.1 19.5 5.2 79.5 0.6 15.6 24.1
Table VI: Average Speed (sec.) results of different methods on 100 real-world noisy images cropped from our new dataset.
(a) Noisy [42]: 36.51dB (b) CBM3D [7]: 37.91dB (c) WNNM [13]: 38.23dB (d) CSF [24]: 39.02dB (e) TNRD [25]: 39.26dB
(f) DnCNN [22]: 36.52dB (g) NI [44]: 37.52dB (h) NC [40, 39]: 37.53dB (i) Ours: 39.41dB (j) Mean Image
Figure 13: Denoised images of a region cropped from the real-world noisy image “Canon 80D ISO 12800 IMG 2321” in our new dataset by different methods. The images are better viewed by zooming in on screen.

V Conclusion

We proposed a new prior learning method for the real-world noisy image denoising problem by exploiting the useful information in both external and internal data. We first learned Gaussian Mixture Models (GMMs) from a set of clean external images as general image prior, and then employed the learned GMM model to guide the learning of adaptive internal prior from the given noisy image. Finally, a set of orthogonal dictionaries were output as the external-internal hybrid prior models for image denoising. Extensive experiments on three real-world noisy image datasets, including a new dataset constructed by us by different types of cameras and camera settings, demonstrated that our proposed method achieves much better performance than state-of-the-art image denoising methods in terms of both quantitative measure and visual perceptual quality.

Appendix A Closed-Form Solution of the Weighted Sparse Coding Problem (7)

For notation simplicity, we ignore the indices in problem (7). It turns into the following weighted sparse coding problem:

(14)

Since

is an orthogonal matrix, problem (

14) is equivalent to:

(15)

For simplicity, we denote . Here we have , , then problem (15) can be written as:

(16)

The problem (16) is separable w.r.t. each and hence can be simplified to independent scalar minimization problems:

(17)

where . Taking derivative of in problem (17) and setting the derivative to be zero. There are two cases for the solution.

(a) If , we have and the solution is So , and the solution can be written as where is the sign function.

(b) If , we have and the solution is So , and the solution can be written as

In summary, we have the final solution of the weighted sparse coding problem (14) as:

(18)

where is the vector of regularization parameter and means element-wise multiplication.

Appendix B Proof of the Theorem 1

Let be two given data matrices. Denote by the external subdictionary and the internal subdictionary. For simplicity, we assume . The problem in Theorem 1 is as follows:

(19)
Proof.

We firstly prove the necessary condition. Since , we have

(20)

The Lagrange function is , where and are the Lagrange multipliers. Take the derivative of w.r.t. and set it to be matrix of conformal dimensions, we can get

(21)

Since and , by left multiplying both sides of the Eq. (22) by , we have

(22)

Put the Eq. (22) back into Eq. (21), we have

(23)

Right multiplying both sides of Eq. (23) by , we have

(24)

This shows that is a symmetric matrix of order . Then we perform economy (or reduced) singular value decomposition (SVD) [51] on , there is

(25)

Hence, we have , or equivalently . The necessary condition is proved.

Now we prove the sufficient condition. If , then . To prove , we left multiply both sides of Eq. (25) by and have . It means that . This only happens when since and is positive definite. Then .

Finally we prove that is the solution of

(26)

Note that by cyclic perturbation which retains the trace unchanged and due to , we have For every satisfying that , , we have