Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

07/12/2020 ∙ by Zongsheng Yue, et al. ∙ Xi'an Jiaotong University 30

Real-world image noise removal is a long-standing yet very challenging task in computer vision. The success of deep neural network in denoising stimulates the research of noise generation, aiming at synthesizing more clean-noisy image pairs to facilitate the training of deep denoisers. In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise generation tasks. Instead of only inferring the posteriori distribution of the latent clean image conditioned on the observed noisy image in traditional MAP framework, our proposed method learns the joint distribution of the clean-noisy image pairs. Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser mapping the noisy image to the clean one and a generator mapping the clean image to the noisy one. The learned joint distribution implicitly contains all the information between the noisy and clean images, avoiding the necessity of manually designing the image priors and noise assumptions as traditional. Besides, the performance of our denoiser can be further improved by augmenting the original training dataset with the learned generator. Moreover, we propose two metrics to assess the quality of the generated noisy image, for which, to the best of our knowledge, such metrics are firstly proposed along this research line. Extensive experiments have been conducted to demonstrate the superiority of our method over the state-of-the-arts both in the real noise removal and generation tasks. The training and testing code is available at https://github.com/zsyOAOA/DANet.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 10

page 11

page 12

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image denoising is an important research problem in low-level vision, aiming at recovering the latent clean image from its noisy observation . Despite the significant advances in the past decades [8, 14, 57, 56], real image denoising still remains a challenging task, due to the complicated processing steps within the camera system, such as demosaicing, Gamma correction and compression [46].

From the Bayesian perspective, most of the traditional denoising methods can be interpreted within the Maximum A Posteriori (MAP) framework, i.e., , which involves one likelihood term and one prior term . Under this framework, there are two methodologies that have been considered. The first attempts to model the likelihood term with proper distributions, e.g., Gaussian, Laplacian, MoG [33, 59, 55] and MoEP [10], which represents different understandings for the noise generation mechanism, while the second mainly focuses on exploiting better image priors, such as total variation [40], non-local similarity [8], low-rankness [15, 17, 47, 52] and sparsity [31, 58, 53]. Despite better interpretability led by Bayesian framework, these MAP-based methods are still limited by the manual assumptions on the noise and image priors, which may largely deviate from the real images.

Figure 1: Illustration of our proposed dual adversarial framework. The solid lines denote the forward process, and the dotted lines mark the gradient interaction between the denoiser and generator during the backword.

In recent years, deep learning (DL)-based methods have achieved impressive success in image denoising task 

[57, 4, 56]. However, as is well known, training a deep denoiser requires large amount of clean-noisy image pairs, which are time-consuming and expensive to collect. To address this issue, several noise generation111 The phrase “noise generation” indicates the generation process of noisy image from clean image throughout this paper. approaches were proposed to simulate more clean-noisy image pairs to facilitate the training of deep denoisers. The main idea behind them is to unfold the in-camera processing pipelines [19, 7], or directly learn the distribution as in  [11, 25] using generative adversarial network (GAN) [16]. However, the former methods involve many hyper-parameters needed to be carefully tuned for specific cameras, and the latter ones suffer from simulating very realistic noisy image with high-dimensional signal-dependent noise distributions. Besides, to the best of our knowledge, there is still no metric to quantitatively assess the quality of the generated noisy images w.r.t. the real ones.

Against these issues, we propose a new framework to model the joint distribution instead of only inferring the conditional posteriori as in conventional MAP framework. Specifically, we firstly factorize the joint distribution from two opposite directions, i.e., and , which can be well approximated by a image denoiser and a noise generator. Then we simultaneously train the denoiser and generator in a dual adversarial manner as illustrated in Fig. 1. After that, the learned denoiser can either be directly used for the real noise removal task, or further enhanced with new clean-noisy image pairs simulated by the learned generator. In summary, the contributions of this work can be mainly summarized as:

  • Different from the traditional MAP framework, our method approximates the joint distribution from two different factorized forms in a dual adversarial manner, which subtlely avoids the manual design on image priors and noise distribution. What’s more, the joint distribution theoretically contains more complete information underlying the data set comparing with the conditional posteriori .

  • Our proposed method can simultaneously deal with both the noise removal and noise generation tasks in one unified Bayesian framework, and achieves superior performance than the state-of-the-arts in both these two tasks. What’s more, the performance of our denoiser can be further improved after retraining on the augmented training data set with additional clean-noisy image pairs simulated by our learned generator.

  • In order to assess the quality of the simulated noisy images by a noise generation method, we design two metrics, which, to the best of our knowledge, are the first metrics to this aim.

2 Related Work

2.1 Noise Removal

Image denoising is an active research topic in computer vision. Under the MAP framework, rational priors are necessary to be pre-assumed to enforce some desired properties of the recovered image. Total variation [40] was firstly introduced to deal with the denoising task. Later, the non-local similarity prior, meaning that the small patches in a large non-local area may share some similar patterns, was considered in NLM [8] and followed by many other denoising methods [14, 15, 30, 28]. Low-rankness [15, 17, 54, 52] and sparsity [31, 50, 30, 58, 53] are another two well-known image priors, which are often used together within the dictionary learning methods. Besides, discriminative learning methods also represent another research line, mainly including Markov random field (MRF) methods [6, 41, 44], cascade of shrinkage fields (CSF) methods [42, 43] and the trainable nonlinear reaction diffusion (TNRD) [12] method. Different from above priors-based methods, noise modeling approaches focus on the other important component of MAP, i.e., likelihood or fidelity term. E.g., Meng and De La Torre [33] proposed to model the noise distribution as mixture of Gaussians (MoG), while Zhu et al. [59] and Yue et al. [55] both introduced the non-parametric Dirichlet Process to MoG to expand its flexibility. Furthermore, Cao et al. [10] proposed the mixture of expotential power (MoEP) distributions to fit more complex noise.

In recent years, DL-based methods achieved significant advances in the image denoising task. Jain and Seung [23] firstly adopted a five-layer network to deal with the denoising task. Then Burger et al. [9]

obtained the comparable performance with BM3D using one plain multi-layer perceptron (MLP). Later, some auto-encoder based methods 

[49, 2] were also immediately proposed. It is worthy mentioning that Zhang et al. [57] proposed the convolutional denoising network DnCNN and achieved the state-of-the-art performance on Gaussian denoising. Following DnCNN, many different network architectures were designed to deal with the denoising task, including RED [32], MemNet[45], NLRN [29], N3Net [37], RIDNet [4] and VDN [56].

2.2 Noise Generation

As is well known, the expensive cost of collecting pairs of training data is a critical limitation for deep learning based denoising methods. Therefore, several methods were proposed to explore the generation mechanism of image noise to facilitate an easy simulation of more training data pairs. One common idea was to generate image pairs by “unprocessing” and “processing” each step of the in-camera processing pipelines, e.g., [19, 7, 24]. However, these methods involve many hyper-parameters to be tuned for specifi camera. Another simpler way was to learn the real noise distribution directly using GAN [16] as demonstrated in [11] and [25]. Due to the complexity of real noise and the instability of training GAN, it is very difficult to train a good generator for simulating realistic noise.

3 Proposed Method

Like most of the supervised deep learning denoising methods, our approach is built on the given training data set containing pairs of real noisy image and clean image , which are accessible thanking to the contributions of [3, 1, 51]. Instead of forcely learning a mapping from to , we attempt to approximate the underlying joint distribution of the clean-noisy image pairs. In the following, we present our method from the Bayesian perspective.

3.1 Two Factorizations of Joint Distribution

In this part, we factorize the joint distribution from two different perspectives, and discuss their insights respectively related to the noise removal and noise generation tasks.

Noise removal perspective: The noise removal task can be considered as inferring the conditional distribution under the Bayesian framework. The learned denoiser in this task represents an implicit distribution to approximate the true distribution . The output of can be seen as an image sampled from this implicit distribution . Based on such understanding, we can obtain a pseudo clean image pair as follows222We mildly assume that is easily implemented by sampling from the empirical distribution of the training data set, and so does as ., i.e.,

(1)

which can be seen as one example sampled from the following pseudo joint distribution:

(2)

Obviously, the better denoiser is, the more accurately that the pseudo joint distribution can approximate the true joint distribution .

Noise generation perspective: In real camera system, image noise is derived from multiple hardware-related random noises (e.g., short noise, thermal noise), and further affected by in-camera processing pipelines (e.g., demosaicing, compression). After introducing an additional latent variable , representing the fundamental elements conducting the hardware-related random noises, the generation process from to can be depicted by the conditional distribution . The generator in this task expresses an implicit distribution to approximate the true distribution . The output of can be seen as an example sampled from , i.e., . Similar as Eq. (1), a pseudo noisy image pair is easily obtained:

(3)

where denotes the distribution of the latent variable

, which can be easily set as an isotropic Gaussian distribution

.

Theoretically, we can marginalize the latent variable to obtain the following pseudo joint distribution as an approximation to :

(4)

where . As suggested in [26], the number of samples can be set as 1 as long as the minibatch size is large enough. Under such setting, the pseudo noisy image pair obtained from the generation process in Eq. (3) can be roughly regarded as an sampled example from .

3.2 Dual Adversarial Model

In the previous subsection, we have derived two pseudo joint distributions from the perspectives of noise removal and noise generation, i.e., and . Now the problem becomes how to effectively train the denoiser and the generator , in order to well approximate the joint distribution . Fortunately, the tractability of sampling process defined in Eqs. (1) and (3) makes such training possible in an adversarial manner as GAN [16], which gradually pushes and toward the true distribution . Specifically, we formulate this idea as the following dual adversarial problem inspired by Triple-GAN [13],

(5)

where , , and denotes the discriminator, which tries to distinguish the real clean-noisy image pair from the fake ones and . The hyper-parameter controls the relative importance between the denoiser and generator . As in [5], we use the Wassertein-1 distance to measure the difference between two distributions in Eq. (5).

The working mechanism of our dual adversarial network can be intuitively explained in Fig. 1. On one hand, the denoiser , delivering the knowledge of , is expected to conduct the joint distribution of Eq. (2), while the noise generator , conveying the information of , is expected to derive the joint distribution of Eq. (4). Through the adversarial effect of discriminator , the denoiser and generator are both gradually optimized so as to pull and toward the true joint distribution during training. On the other hand, the capabilities of and are mutually enhanced by their dual regularization between each other. Given any real image pair and one pseudo image pair from generator or from denoiser , the discriminator will be updated according to the adversarial loss. Then is fixed as a criterion to update both and simultaneously as illustrated by the dotted lines in Fig. 1, which means and are keeping interactive and guided by each other in each iteration.

Previous researches [22, 60] have shown that it is benefical to mix the adversarial objective with traditional losses, which would speed up and stabilize the training of GAN. For noise removal task, we adopt the loss, i.e., , which enforces the output of denoiser to be close to the groundtruth. For the generator , however, the direct loss would not be benefical because of the randomness of noise. Therefore, we propose to apply the constrain on the statistical features of noise distribution:

(6)

where represents the Gaussian filter used to extract the first-order statistical information of noise. Intergrating these two regularizers into the adversarial loss of Eq. (5), we obtain the final objective:

(7)

where and are hyper-parameters to balance different losses. More sensetiveness analysis on them are provided in Sec. 5.2.

3.3 Training Strategy

In the dual adversarial model of Eq. (7), we have three objects to be optimized, i.e., the denoiser , generator and discriminator . As in most of the GAN-related papers [16, 5, 13], we jointly train , and but update them in an alternating manner as shown in Algorithm 1. In order to stabilize the training, we adopt the gradient penalty technology in WGAN-GP [18], enforcing the discriminator to satisfy 1-Lipschitz constraint by an extra gradient penalty term.

After training, the generator is able to simulate more noisy images given any clean images, which are easily obtained from the original training data set or by downloading from internet. Then we can retrain the denoiser by adding more synthetic clean-noisy image pairs generated by to the training data set. As shown in Sec. 5, this strategy can further improve the denoising performance.

3.4 Network Architecture

The denoiser , generator and discriminator in our framework are all parameterized as deep neural networks due to their powerful fitting capability. As shown in Fig. 1, the denoiser takes noisy image as input and outputs denoised image , while the generator takes the concatenated clean image and latent variable as input and outputs the simulated noisy image . For both and , we use the UNet [39] architecture as backbones. Besides, the residual learning strategy [57] is adopted in both of them. The discriminator

contains five stride convolutional layers to reduce the image size and one fully connected layer to fuse all the information. More details about the network architectures are provided in the supplementary material due to page limitation. It should be noted that our proposed method is a general framework that does not depend on the specific architecture, therefore most of the commonly used networks architectures 

[57, 32, 4] in low-level vision tasks can be substituted.

0:  hyper-parameters: , , ,
1:  while  had not converged do
2:     for  do
3:        Sample a batch of pairs from
4:        Sample a batch of pairs from and from
5:        Update discriminator with fixed and
6:     end for
7:     Update denoiser with fixed and
8:     Update generator with fixed and
9:  end while
Algorithm 1 Daul adversarial network.

4 Evaluation Metrics

For the noise removal task, PSNR and SSIM [48] can be readily adopted to compare the denoising performance of different methods. However, to the best of our knowledge, there is still no any quantitative metric having been designed for noise generation task. To address this issue, we propose two metrics to compare the similarity between the generated and the real noisy images as follows:

  • PGap (PSNR Gap): The main idea of PGap is to compare the synthetic and real noisy images indirectly by the performance of the denoisers trained on them. Let , denote the available training and testing sets, whose noise distributions are same or similar. Given any one noisy image generator , we can synthesize another training set:

    (8)

    After training two denoisers on the original data set and on the generated data set under the same conditions, we can define PGap as

    (9)

    where represents the PSNR result of denoiser on testing data set . It is obvious that, if the generated noisy images in are close to the real noisy ones in , the performance of would be close to , and thus the PGap would be small.

  • AKLD (Average KL Divergence): The noise generation task aims at synthesizing fake noisy image from the real clean image to match the real noisy image in distribution. Therefore, the KL divergence between the conditional distributions on the fake image pair and on the real image pair can serve as a metric. To make this conditional distribution tractable, we utlize the pixel-wisely Gaussian assumption for real noise in recent work VDN [56], i.e.,

    (10)

    where

    (11)

    denotes the reshape operation from matrix to vector,

    denotes the Gaussian filter, and the square of is pixel-wise operation. Based on such explicit distribution assumption, the KL divergence between and can be regarded as an intuitive metric. To reduce the influence of randomness, we randomly generate synthetic fake noisy images:

    (12)

    for any real clean image , and define the following average KL divergence as our metric, i.e.,

    (13)

    Evidently, the smaller AKLD is, the better the generator is. In the following experiments, we set .

5 Experimental Results

In this section, we conducted a series of experiments on several real-world denoising benchmarks. In specific, we considered two groups of experiments: the first group (Sec. 5.2) is designed for evaluating the effectiveness of our method on both of the noise removal and noise generation tasks, which is implemented on one specific real benchmark containing training, validation and testing sets; while the second group (Sec. 5.3) is conducted on two real benchmarks that only consist of some noisy images as testing set, aiming at evaluating its performance on general real-world denoising tasks. Due to the page limitation, the running time comparisons are listed in the supplementary material.

In brief, we denote the jointly trained Dual Adversarial Network following Algorithm 1 as DANet. As discussed in Sec. 3.3, the learned generator in DANet is able to augment the original training set by generating more synthetic clean-noisy image pairs, and the retrained denoiser on this augmented training data set under loss is denoted as .

5.1 Experimental Settings

Parameter settings and network training: In the training stage of DANet, the weights of and were both initialized according to [20], and the weights of

were initialized from a zero-centered Normal distribution with standard deviation 0.02 as 

[38]. All the three networks were trained by Adam optimizer [27] with momentum terms for and for both and . The learning rates were set as , and for , and

, respectively, and linearly decayed in half every 10 epochs.

In each epoch, we randomly cropped patches with size from the images for training. During training, we updated three times for each update of and . We set , throughout the experiments, and the sensetiveness analysis about them can be found in Sec. 5.2. As for , we set it as , meaning the denoiser and generator contribute equally in our model. The penalty coefficient in WGAN-GP [18] is set as 10 following its default settings. As for , the denoiser

was retrained with the same settings as that in DANet. All the models were trained using PyTorch 

[35].

 

Metrics Methods
 
CBDNet ULRD GRDN DANet

 

PGap 8.30 4.90 2.28 2.06

 

AKLD 0.728 0.545 0.443 0.212

 

Figure 2: PSNR results of different methods during training.
Table 1: The PGap and AKLD performances of different compared methods on the SIDD validation data set. And the best results are highlighted in bold.

5.2 Results on SIDD Benchmark

In this part, SIDD [1] benchmark is employed to evaluate the denoising performance and generation quality of our proposed method. The full SIDD data set contains about clean-noisy image pairs as training data, and the rest image pairs are held as the benchmark for testing. For fast training and evaluation, one medium training set (320 image pairs) and validation set (40 image pairs) are also provided, but the testing results can only be obtained by submission. We trained DANet and on the medium version training set, and evaluated on the validation and testing sets.

Figure 3:

Illustration of one typical generated noisy images (1st row) by different methods and their corresponding noise (2nd row) and variance map (3rd row) estimated by Eq. (

11). The first column represents the real ones in SIDD validation set.

Noise Generation: The generator in DANet is mainly used to synthesize the corresponding noisy image given any clean one. As introduced in Sec. 4, two metrics PGap and AKLD are designed to assess the generated noisy image. Based on these two metrics, we compared DANet with three recent methods, including CBDNet [19], ULRD [7] and GRDN [25]. CBDNet and ULRD both attempted to generate noisy images by simulating the in-camera processing pipelines, while GRDN directly learned the noise distribution using GAN [16]. It should be noted that ULRD [7] and GRDN [25] both make use of the metadata of the images.

Table 1 lists the PGap values of different compared methods on SIDD validation set. For the calculation of PGap, SIDD validation set is regarded as the testing set in Eq. (9). Obviously, our proposed DANet achieves the best performance. Figure 1 displays the PSNR curves of different denoisers trained on the real training set or only the synthetic training sets generated by different methods, which gives an intuitive illustration for our defined PGap. It can be seen that all the methods tend to gradually overfit to their own synthetic training set, especially for CBDNet. However, DANet performs not only more stably but also better than other methods.

 

Datasets Metrics Methods
10Δ 
CBM3D WNNM DnCNN CBDNet RIDNet VDN DANet

 

Testing PSNR 25.65 25.78 23.66 33.28 - 39.26 39.25 39.43
10Δ 
SSIM 0.685 0.809 0.583 0.868 - 0.955 0.955 0.956

 

Validation PSNR 25.29 26.31 38.56 38.68 38.71 39.29 39.30 39.47
10Δ 
SSIM 0.412 0.524 0.910 0.909 0.913 0.911 0.916 0.918

 

Table 2: The PSNR and SSIM results of different methods on SIDD validation and testing sets. The best results are highlighted in bold.

The average AKLD results calculated on all the images of SIDD validation set are also listed in Table 1. The smallest AKLD of DANet indicates that it learns a better implicit distribution to approximate the true distribution . Fig. 3 illustrates one typical example of the real and synthetic noisy images generated by different methods, which provides an intuitive visualization for the AKLD metric. In summary, DANet outperforms other methods both in quantization and visualization, even though some of them make use of additional metadata.

Figure 4: One typical denoising example in the SIDD validation dataset.

 

Metrics
 
+

 

PSNR 38.66 39.30 39.33 39.39

 

SSIM 0.901 0.916 0.916 0.917

 

Table 4: The PGap and AKLD results of DANet under different values on SIDD validation data set.

 

Metrics
 
0 5 10 50 +

 

PGap 5.33 3.10 2.06 4.17 15.14

 

AKLD 0.386 0.216 0.212 0.177 0.514

 

Table 3: The PSNR and SSIM results of DANet under different values on SIDD validation data set.

Noise Removal: To verify the effectiveness of our proposed method on real-world denoising task, we compared it with several state-of-the-art methods, including CBM3D [14], WNNM [17], DnCNN [57], CBDNet [19], RIDNet [4] and VDN [56]. Table 2 lists the PSNR and SSIM results of different methods on SIDD validation and testing sets. It should be noted that the results on testing sets are cited from official website333https://www.eecs.yorku.ca/~kamel/sidd/benchmark.php, but the results on validation set are calculated by ourself. For fair comparison, we retrained DnCNN and CBDNet on SIDD training set. From Table 2, it is easily observed that: 1) deep learning methods obviously performs better than traditional methods CBM3D and WNNM due to the powerful fitting capability of DNN; 2) DANet and both outperform the state-of-the-art real-world denoising methods, substantiating their effectiveness; 3) surpasses DANet about 0.18dB PSNR, which indicates that the synthetic data by facilitates the training of the denoiser .

Fig. 4 illustrates the visual denoising results of different methods. It can be seen that CBM3D and WNNM both fail to remove the real-world noise. DnCNN tends to produce over-smooth edges and textures due to the

loss. CBDNet, RIDNet and VDN alleviate this phenomenon to some extent since they adopt more robust loss functions. DANet recovers sharper edges and more details owning to the adversarial loss. After retraining with more generated image pairs,

obtains the closer denoising results to the groundtruth.

Figure 5: This figure displays the real or generated noisy images (the 1st row) by DANet under different value and the corresponding noise (the 2nd row). From left to right: (a) real case, (b) , (c) , (d) , (e) , (f) .

Hyper-parameter Analysis: Our proposed DANet involves two hyper-parameters and in Eq. (7). The pamameter mainly influences the performance of denoiser , while directly affects the generator .

 

Metrics Methods
 
BaseD DANet

 

PSNR 39.19 39.30

 

SSIM 0.907 0.916

 

Table 6: The comparison results of BaseG and DANet on SIDD validation set.

 

Metrics Methods
 
BaseG DANet

 

PGap 4.07 2.06

 

AKLD 0.223 0.212

 

Table 5: The comparison results of BaseD and DANet on SIDD validation set.

Table 4 lists the PSNR/SSIM results of DANet under different settings, where represents the results of the denoiser trained only with loss. As expected, small value, meaning that the adversarial loss plays more important role, leads to the decrease of PSNR and SSIM performance to some extent. However, when value is too large, the regularizer will mainly dominates the performance of denoiser . Therefore, we set as a moderate value throughout all the experiments, which makes the denoising results more realistic as shown in Fig. 4 even sacrificing a little PSNR performance.

The PGap and average AKLD results of DANet under different values are shown in Table 4. Note that represents the results of the generator trained only with the regularizer of Eq. (6). Fig. 5 also shows the corresponding visual results of one typical example. As one can see, fails to simulate the real noise with , which demonstrates that the regularizer of Eq. (6) is able to stabilize the training of GAN. However, it is also difficult to train only with the regularizer of Eq. (6) as shown in Fig. 5 (f). Taking both the quantitative and visual results into consideration, is constantly set as in our experiments.

Ablation studies: To verify the marginal benefits brought up by our dual adversarial loss, two groups of ablation experiments are designed in this part. In the first group, we train DANet without the generator and denote the trained model as BaseD. On the contrary, we train DANet without the denoiser and denote the trained model as BaseG. And the comparison results of these two baselines with DANet on noise removal and noise generation tasks are listed in Table 6 and Table 6, respectively. It can be easily seen that DANet achieves better performance than both the two baselines in noise removal and noise generation tasks, especially in the latter, which illustrates the mutual guidance and amelioration between the denoiser and the generator.

5.3 Results on DND and Nam Benchmarks

To evaluate the performance of our method in general real-world denoising tasks, we test on two real-world benchmarks, i.e., DND [36] and Nam [34]. These two benchmarks do not provide any training data, therefore they are suitable to test the generalization capability of any denoiser. Following the experimental setting in RIDNet [4], we trained another model using image patches from SIDD [1], Poly [51] and RENOIR [3] for fair comparison. To be distinguished from the model of Sec. 5.2, the trained models under this setting are denoted as GDANet and , aiming at dealing with the general denoising task in real application. For the training of , we employed the images of MIR Flickr [21] as clean images to synthesize more training pairs using .

 

Metrics Methods
 
CBM3D WNNM DnCNN CBDNet RIDNet VDN GDANet

 

PSNR 34.51 34.67 32.43 38.06 39.26 39.38 39.47 39.58

 

SSIM 0.8244 0.8646 0.7900 0.9421 0.9528 0.9518 0.9548 0.9545

 

Table 7: The PSNR and SSIM results of different methods on DND benchmark. The best results are highlighted as bold.
Figure 6: Denoising results of different methods on DND benchmark.

 

Metrics Methods
 
CBM3D WNNM DnCNN CBDNet RIDNet VDN GDANet

 

PSNR 35.36 35.33 35.68 39.20 39.33 38.66 39.91 39.79

 

SSIM 0.8708 0.8812 0.8811 0.9676 0.9623 0.9613 0.9693 0.9689

 

Table 8: The PSNR and SSIM results of different methods on the Nam benchmark. The best results are highlighted in bold.

DND Benchmark: This benchmark contains 50 real noisy and almost noise-free image pairs. However, the almost noise-free images are not publicly released, thus the PSNR/SSIM results can only be obtained through online submission system. Table 7 lists the PSNR/SSIM results released on the official DND benchmark website444https://noise.visinf.tu-darmstadt.de/benchmark/. From Table 7, we have the following observations: 1) outperforms the state-of-the-art VDN about 0.2dB PSNR, which is a large improvement in the field of real-world denoising; 2) GDANet obtains the highest SSIM value, which means that it preserves more structural information than other methods as that can be visually observed in Fig. 6; 3) DnCNN cannot remove most of the real noise because it overfits to the Gaussian noise case; 4) the classical CBM3D and WNNM methods cannot handle the complex real noise.

Figure 7: One typical denoising example of Nam benchmark by different methods.

Nam Benchmark: This benchmark contains 11 real static scenes and the corresponding noise-free images, which are obtained by averaging 500 noisy images of the same scenes. We cropped these images into patches, and randomly selected 100 of them for the purpose of evaluation. The quantitative PSNR and SSIM results are given in Table 8. It is easy to see that our proposed GDANet performs better than the other compared methods. Note that VDN does not achieve good performance since the noisy images in this benchmark are JPEG compressed, which is not considered in VDN. For easy comparison, we also display one typical denoised example by different methods in Fig. 7, and the better visual performance of our methods can be observed.

Discussion: Different from the results in Sec. 5.2, GDANet performs more stably than as shown in Table 7 and 8, especially on SSIM metric. That’s because the noise types simulated by the generator , which are mainly determined by the training data set, does not match well with that contained in the testing set. Therefore, GDANet is suggested to be used in such general real-world denoising task with uncertain noise types, while is more suitable in the scenario that provides similar training and testing data sets.

6 Conclusion

We have proposed a new Bayesian framework, namely dual adversarial network (DANet), for real-world image denoising. Different from the traditional MAP framework relied on subjective pre-assumptions on the noise and image priors, our proposed method focuses on learning the joint distribution directly from data. To estimate the joint distribution, we attempt to approximate it by its two different factorized forms using an dual adversarial manner, which correspondes to two tasks, i.e., noise removal and noise generation. For assessing the quality of synthetic noisy image, we have designed two applicable metrics, to the best of our knowledge, for the first time. The proposed DANet intrinsically provides a general methodology to facilitate the study of other low-level vision tasks, such as super-resolution and deblurring. Comprehensive experiments have demonstrated the superiority of DANet as compared with state-of-the-art methods specifically designed for both the noise removal and noise generation tasks.

Acknowledgements: This research was supported by National Key R&D Program of China (2018YFB1004300) and the China NSFC project under contracts 11690011, 61721002, U1811461, and Hong Kong RGC RIF grant (R5001-18).

References

  • [1] A. Abdelhamed, S. Lin, and M. S. Brown (2018-06) A high-quality denoising dataset for smartphone cameras. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Cited by: §3, §5.2, §5.3.
  • [2] F. Agostinelli, M. R. Anderson, and H. Lee (2013) Adaptive multi-column deep neural networks with application to robust image denoising. In Advances in Neural Information Processing Systems 26, pp. 1493–1501. Cited by: §2.1.
  • [3] J. Anaya and A. Barbu (2014) RENOIR - a benchmark dataset for real noise reduction evaluation.. arXiv preprint arXiv:1409.8230. External Links: Link Cited by: §3, §5.3.
  • [4] S. Anwar and N. Barnes (2019) Real image denoising with feature attention. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3155–3164. Cited by: §1, §2.1, §3.4, §5.2, §5.3.
  • [5] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §3.2, §3.3.
  • [6] A. Barbu (2009) Training an active random field for real-time image denoising. IEEE Transactions on Image Processing 18 (11), pp. 2451–2462. Cited by: §2.1.
  • [7] T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and J. T. Barron (2019) Unprocessing images for learned raw denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11036–11045. Cited by: §1, §2.2, §5.2.
  • [8] A. Buades, B. Coll, and J. Morel (2005) A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 60–65. Cited by: §1, §1, §2.1.
  • [9] H. C. Burger, C. J. Schuler, and S. Harmeling (2012) Image denoising: can plain neural networks compete with bm3d?. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2392–2399. Cited by: §2.1.
  • [10] X. Cao, Y. Chen, Q. Zhao, D. Meng, Y. Wang, D. Wang, and Z. Xu (2015-12) Low-rank matrix factorization under general mixture noise distributions. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §1, §2.1.
  • [11] J. Chen, J. Chen, H. Chao, and M. Yang (2018) Image blind denoising with generative adversarial network based noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3155–3164. Cited by: §1, §2.2.
  • [12] Y. Chen and T. Pock (2017) Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1256–1272. Cited by: §2.1.
  • [13] L. Chongxuan, T. Xu, J. Zhu, and B. Zhang (2017) Triple generative adversarial nets. In Advances in neural information processing systems, pp. 4088–4098. Cited by: §3.2, §3.3.
  • [14] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian (2007) Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing 16 (8), pp. 2080–2095. Cited by: §1, §2.1, §5.2.
  • [15] W. Dong, G. Shi, and X. Li (2012) Nonlocal image restoration with bilateral variance estimation: a low-rank approach. IEEE transactions on image processing 22 (2), pp. 700–711. Cited by: §1, §2.1.
  • [16] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §2.2, §3.2, §3.3, §5.2.
  • [17] S. Gu, L. Zhang, W. Zuo, and X. Feng (2014) Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2862–2869. Cited by: §1, §2.1, §5.2.
  • [18] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777. Cited by: §3.3, §5.1.
  • [19] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang (2019) Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1712–1722. Cited by: §1, §2.2, §5.2, §5.2.
  • [20] K. He, X. Zhang, S. Ren, and J. Sun (2015)

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification

    .
    In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §5.1.
  • [21] M. J. Huiskes, B. Thomee, and M. S. Lew (2010) New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In Proceedings of the international conference on Multimedia information retrieval, pp. 527–536. Cited by: §5.3.
  • [22] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §3.2.
  • [23] V. Jain and S. Seung (2008) Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems 21, pp. 769–776. Cited by: §2.1.
  • [24] R. Jaroensri, C. Biscarrat, M. Aittala, and F. Durand (2019) Generating training data for denoising real rgb images via camera pipeline simulation. arXiv preprint arXiv:1904.08825. Cited by: §2.2.
  • [25] D. Kim, J. Ryun Chung, and S. Jung (2019) Grdn: grouped residual dense network for real image denoising and gan-based real-world noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §1, §2.2, §5.2.
  • [26] D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. In ICLR 2014 : International Conference on Learning Representations (ICLR) 2014, Cited by: §3.1.
  • [27] D. P. Kingma and J. L. Ba (2015) Adam: a method for stochastic optimization. In ICLR 2015 : International Conference on Learning Representations 2015, External Links: Link Cited by: §5.1.
  • [28] M. Lebrun, A. Buades, and J. Morel (2013) A nonlocal bayesian image denoising algorithm. Siam Journal on Imaging Sciences 6 (3), pp. 1665–1688. Cited by: §2.1.
  • [29] D. Liu, B. Wen, Y. Fan, C. C. Loy, and T. Huang (2018) Non-local recurrent network for image restoration. In NIPS 2018: The 32nd Annual Conference on Neural Information Processing Systems, pp. 1673–1682. Cited by: §2.1.
  • [30] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman (2009) Non-local sparse models for image restoration. In 2009 IEEE 12th International Conference on Computer Vision, pp. 2272–2279. Cited by: §2.1.
  • [31] J. Mairal, M. Elad, and G. Sapiro (2007) Sparse representation for color image restoration. IEEE Transactions on image processing 17 (1), pp. 53–69. Cited by: §1, §2.1.
  • [32] X. Mao, C. Shen, and Y. Yang (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In NIPS’16 Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2810–2818. Cited by: §2.1, §3.4.
  • [33] D. Meng and F. De La Torre (2013-12) Robust matrix factorization with unknown noise. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §1, §2.1.
  • [34] S. Nam, Y. Hwang, Y. Matsushita, and S. Joo Kim (2016) A holistic approach to cross-channel image noise modeling and its application to image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1683–1691. Cited by: §5.3.
  • [35] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §5.1.
  • [36] T. Plotz and S. Roth (2017) Benchmarking denoising algorithms with real photographs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1586–1595. Cited by: §5.3.
  • [37] T. Plötz and S. Roth (2018) Neural nearest neighbors networks. In NIPS 2018: The 32nd Annual Conference on Neural Information Processing Systems, pp. 1087–1098. Cited by: §2.1.
  • [38] A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR 2016 : International Conference on Learning Representations 2016, External Links: Link Cited by: §5.1.
  • [39] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. External Links: Link Cited by: §3.4.
  • [40] L. I. Rudin, S. Osher, and E. Fatemi (1992) Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60 (1-4), pp. 259–268. Cited by: §1, §2.1.
  • [41] K. G. G. Samuel and M. F. Tappen (2009) Learning optimized map estimates in continuously-valued mrf models. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 477–484. Cited by: §2.1.
  • [42] U. Schmidt and S. Roth (2014) Shrinkage fields for effective image restoration. In CVPR ’14 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2774–2781. Cited by: §2.1.
  • [43] U. Schmidt (2017) Half-quadratic inference and learning for natural images. Cited by: §2.1.
  • [44] J. Sun and M. F. Tappen (2011) Learning non-local range markov random field for image restoration. In CVPR 2011, pp. 2745–2752. Cited by: §2.1.
  • [45] Y. Tai, J. Yang, X. Liu, and C. Xu (2017) MemNet: a persistent memory network for image restoration. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4549–4557. Cited by: §2.1.
  • [46] Y. Tsin, V. Ramesh, and T. Kanade (2001) Statistical calibration of ccd imaging process. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 1, pp. 480–487. Cited by: §1.
  • [47] R. Wang, B. Chen, D. Meng, and L. Wang (2018) Weakly supervised lesion detection from fundus images. IEEE transactions on medical imaging 38 (6), pp. 1501–1512. Cited by: §1.
  • [48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §4.
  • [49] J. Xie, L. Xu, and E. Chen (2012) Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems 25, pp. 341–349. Cited by: §2.1.
  • [50] Q. Xie, Q. Zhao, D. Meng, and Z. Xu (2017)

    Kronecker-basis-representation based tensor sparsity and its applications to tensor recovery

    .
    IEEE transactions on pattern analysis and machine intelligence 40 (8), pp. 1888–1902. Cited by: §2.1.
  • [51] J. Xu, H. Li, Z. Liang, D. Zhang, and L. Zhang (2018) Real-world noisy image denoising: a new benchmark. arXiv preprint arXiv:1804.02603. Cited by: §3, §5.3.
  • [52] J. Xu, L. Zhang, D. Zhang, and X. Feng (2017) Multi-channel weighted nuclear norm minimization for real color image denoising. ICCV. Cited by: §1, §2.1.
  • [53] J. Xu, L. Zhang, and D. Zhang (2018-09) A trilateral weighted sparse coding scheme for real-world image denoising. In The European Conference on Computer Vision (ECCV), Cited by: §1, §2.1.
  • [54] H. Yong, D. Meng, W. Zuo, and L. Zhang (2017) Robust online matrix factorization for dynamic background subtraction. IEEE transactions on pattern analysis and machine intelligence 40 (7), pp. 1726–1740. Cited by: §2.1.
  • [55] Z. Yue, H. Yong, D. Meng, Q. Zhao, Y. Leung, and L. Zhang (2019) Robust multiview subspace learning with nonindependently and nonidentically distributed complex noise. IEEE transactions on neural networks and learning systems. Cited by: §1, §2.1.
  • [56] Z. Yue, H. Yong, Q. Zhao, D. Meng, and L. Zhang (2019) Variational denoising network: toward blind noise modeling and removal. In Advances in Neural Information Processing Systems, pp. 1688–1699. Cited by: §1, §1, §2.1, 2nd item, §5.2.
  • [57] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §1, §1, §2.1, §3.4, §5.2.
  • [58] M. Zhou, H. Chen, L. Ren, G. Sapiro, L. Carin, and J. W. Paisley (2009) Non-parametric bayesian dictionary learning for sparse image representations. In Advances in neural information processing systems, pp. 2295–2303. Cited by: §1, §2.1.
  • [59] F. Zhu, G. Chen, J. Hao, and P. Heng (2016) Blind image denoising via dependent dirichlet process tree. IEEE transactions on pattern analysis and machine intelligence 39 (8), pp. 1518–1531. Cited by: §1, §2.1.
  • [60] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §3.2.