Transfer Learning from Synthetic to Real-Noise Denoising with Adaptive Instance Normalization

02/26/2020 ∙ by Yoonsik Kim, et al. ∙ Seoul National University 0

Real-noise denoising is a challenging task because the statistics of real-noise do not follow the normal distribution, and they are also spatially and temporally changing. In order to cope with various and complex real-noise, we propose a well-generalized denoising architecture and a transfer learning scheme. Specifically, we adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set. We also introduce a transfer learning scheme that transfers knowledge learned from synthetic-noise data to the real-noise denoiser. From the proposed transfer learning, the synthetic-noise denoiser can learn general features from various synthetic-noise data, and the real-noise denoiser can learn the real-noise characteristics from real data. From the experiments, we find that the proposed denoising method has great generalization ability, such that our network trained with synthetic-noise achieves the best performance for Darmstadt Noise Dataset (DND) among the methods from published papers. We can also see that the proposed transfer learning scheme robustly works for real-noise images through the learning with a very small number of labeled data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image restoration tasks [15, 25, 27, 26, 51, 42, 23, 32]

have achieved noticeable improvement with the development of convolutional neural network (CNN). Although most of image restoration methods work well on synthetically degraded images

[21, 52, 9, 22], they show insufficient performance on the real degradations.

Regarding the denoising methods, the networks trained with synthetic-noise (SN) does not work well for the real-world images because of the discrepancy in the distribution of SN and real-noise (RN). Specifically, CNNs [48, 49, 50]

trained with Gaussian noise does not work well for the real-world images because the CNNs are overfitted to the Gaussian distribution. The problem of overfitting can also be seen from a toy regression example in Fig. 

1. As shown in Fig. 1(a), the severely overfitted regression method (‘w/o Regularizer’) shows worse performance than a regularized method (‘w/ Regularizer’) on the synthetic test data. Moreover, it can be seen in Fig. 1(b) that the generalization ability is much worse when the training and test domains are different.

(a) Synthetic Data
(b) Real Data
(c) Transfer Learning
Figure 1: A toy regression example presenting the effects of regularization and transfer learning. (a) We assume that training and test data are sampled from a 5th order polynomial, with additive white Gaussian noise (AWGN). The original regression model (without regularizer) is denoted as w/o Regularizer, which is a 10th order polynomial model. As well know, the higher-order model overfits the data. Assuming that a regularization method successfully degenerates the model to a 6th order one (w/ Regularizer), then overfitting is relieved. It can be seen from mean squared error (MSE) on synthetic test data that the regularization can enhance the performance when training and test distributions are the same. (b) We assume another 5th-order polynomial that generates a real data that has some domain difference from the synthetic one. It can be seen from the MSE on real test data that the regularization is essential for processing other distributions. (c) Transfer learning regression method w/ Regularizer + TF is fine-tuned with few real data samples from w/ Regularizer. It can be seen from the MSE on real test data that transfer learning can be trained efficiently with few real training samples.

To better address the problem due to the different data distribution between training and test sets, two kinds of approaches have been developed: (1) obtaining the pairs of RN image and corresponding near-noise-free image [34, 38, 5, 2, 43], and (2) finding more realistic noise model [17, 7].

The RN datasets enable the quantitative comparison of denoising performance on real-world images and also provide the training sets for learning-based methods. The CNNs trained with RN datasets robustly work on the real-world images, because domains of training and test set almost coincide. However, acquiring the pairs of RN images needs specialized knowledge, and the amount of provided datasets would not be enough for training a deeper CNN [46, 44]. Furthermore, learning-based methods can be easily overfitted to a specific camera device (dataset), which cannot cover all the devices that have different characteristics such as gamma correction, color correction, and other in-camera pipelines.

For a finding more realistic noise model, CBDNet [17] synthesized near-RN images by considering realistic noise models and simulating the in-camera pipeline. It generates enough dataset that simulates more than 200 camera response functions. The CBDNet shows excellent performance on RN images even though the CNN is trained with the SN. Furthermore, they showed that additional training with RN dataset improves performance. Although realistic noise modeling indeed reduces the domain discrepancy between SN and RN, there still remains a domain discrepancy to be handled. Moreover, CNN can be overfitted to a certain noise model that is actually not a ‘real’ noise.

From these observations, we propose a novel denoiser that is well generalized to the various RN from camera devices by employing an adaptive instance normalization (AIN) [41, 19, 28, 37]. In recent CNN based methods for restoring the synthetic degradations [29, 52, 22], regularization methods have not been exploited due to the small performance gain (even degrading performance). This indicates that a CNN is overfitted to the training data to get the best performance when domains of training and test set coincide [13].

On the other hand, the denoiser trained with SN needs regularization, for applying it to the RN denoising. As shown in the example of Fig. 1 (a) and (b) with ‘w/ Regularizer’, the network needs to be generalized through the regularization. In this respect, we propose a well-regularized denoiser by adopting the AIN as a regularization method. Specifically, the affine transform parameters for the normalization of features are generated from the pixel-wise noise level. Then, the transform parameters adaptively scale and shift the feature maps according to the noise characteristics, which results in the generalization of the CNN.

Furthermore, we propose a transfer learning scheme from SN to RN denoising network to reduce the domain discrepancy between the synthetic and the real. As mentioned above, the RN dataset would not be sufficient to train a CNN, which can also be easily overfitted to a certain RN dataset. Hence, we devise a transfer learning scheme that learns the general and invariant information of denoising from the SN domain and then transfer-learns the domain-specific information from the information of RN. As can be seen in Fig. 1(c), we believe that the SN denoiser can be adapted to an RN-denoiser by re-transforming normalized features. Specifically, the parameters of AIN are updated using an RN dataset. The proposed scheme based on transfer learning can be applied to any dataset that has a small number of labeled data. That is, a CNN trained with the SN is easily transferred to work for the RN removal, without the need for training the whole network with the RN.

The contribution of this work can be summarized as follows:

  • We propose a novel well-generalized denoiser based on the AIN, which enables the CNN to work for various noise from many camera devices.

  • We introduce a transfer learning for the denoising scheme, which learns the domain-invariant information from SN data and updates affine transform parameters of AIN for the different-domain data.

  • The proposed method achieves state-of-the-art performance on the SN and RN images.

Figure 2:

Illustration of the proposed denoiser. The noise level estimator and reconstruction network are U-Net based architecture, so the feature maps are down/up-sampled by average-pool/transposed convolution. We denote each scale of feature map as

where can be 1, 2, and 4. All the represented convolution in reconstruction network is kernel having 64 feature maps excluding last convolution. Feature representation of noise level estimator is also composed of convolutions with 32 channels and noise level maps are achieved from convolutions having 3 channel outputs. The amount of overall parameters is 13.7 M.

2 Related Works

The statistics of RN in standard RGB (sRGB) images depend on the properties of camera sensors and in-camera pipelines. Specifically, shot noise and readout noise are generated from the sensor, and the statistics of generated noise are changed according to the in-camera pipeline such as demosaicing, gamma correction, in-camera denoiser, white balancing, color correction, etc [35]. There have been several works to approximate the RN model, including Gaussian-Poisson [14, 30]

, heteroscedastic Gaussian 

[18]

, Gaussian Mixture Model (GMM) 

[53], and deep leaning based methods [10, 1]. Considering the camera pipeline, CBDNet [17] and Unprocessing [7] also considered realistic noise models. Specifically, they obtained near-RN images by adding the heteroscedastic Gaussian noise to the pseudo-raw images and feeding them to the camera pipeline. These methods can simulate more than 200 camera response functions, and thus generate noisy images having different characteristics. Moreover, CBDNet is alternately trained with the RN and SN to overcome overfitting to the noise model. We think the alternate training scheme would incur training instability due to different data distributions, and also cannot train quite different RN effectively. Thus, we introduce a new transfer learning scheme that can simply but effectively adapt SN denoiser to other RN ones by re-transforming the normalized feature map.

3 Proposed

We aim to train a robust RN denoiser, which reduces the discrepancy between the distributions of training and test sets, by proposing a novel denoiser and transfer learning. Precisely, we propose denoising architecture using the AIN, which can be well generalized to RN images. Also, we introduce a transfer learning scheme to reduce the remaining data discrepancy, which consists of two stages: (1) training a denoiser with SN dataset and (2) transfer learning with RN dataset , where and are noisy image and noise-free image respectively, and the subscript is for SN and for RN. We use the noise model from CBDNet for generating from with the noise level of where denotes SN image. After training SN denoiser with , RN denoiser is trained with RN image and near noise-free image . In the transfer learning stage, domain-specific parameters are only updated to effectively preserve learned knowledge from SN data.

3.1 Adaptive Instance Normalization Denoising Network

We present a novel AIN denoising network (AINDNet), where the same architecture is employed both for SN and RN denoiser. We compose AINDNet with a noise level estimator and a reconstruction network, which is presented in Fig. 2. The noise level estimator takes a noisy image y as an input and generates the estimated noise level map where denotes a training parameter of estimator. The reconstruction network takes (y) and y as input and generates denoised image where denotes a training parameter of reconstruction network. The reconstruction network is U-Net based architecture with AIN Residual blocks (AIN-ResBlocks).

Figure 3: Illustration of the proposed AIN-ResBlock with corresponding kernel size (), feature scale (), and number of features (). Note that is linearly increasing according to

. Leaky ReLU is employed for an activation function. The Norm (red) block denotes channel-wise spatial normalization block. Average-pool scales the size of

to be the same as that of .

Noise Level Estimator

Estimating the noise level would not be an easy task due to the complex noise model and in-camera pipeline. In our experiment, we find that previous simple noise level estimators [17, 7], which consist of five convolutions, could not accurately estimate the noise level. The main reason is that the previous estimators have a small receptive field so that it could not fully capture complex noise information. From this observation, we design a new noise level estimator with a larger receptive field by employing down/up-sampling and multi-scale estimations. Specifically, estimator produces down-scaled estimation map and original-sized estimation map . Then, these two outputs are weight averaged to feed reconstruction network:

(1)

where , ,

denotes the height and width of the image, and the linear interpolation respectively.

is empirically determined to 0.8. From the weight average of multi-scale estimates, we can achieve region-wisely smoothed , which follows general the characteristic of RN.

Adaptive Instance Normalization

The proposed AIN-ResBlock plays two crucial roles in the proposed denoising scheme. One is regularizing the network not to be overfitted to SN images, and the other is adapting SN denoiser to RN denoiser. For this, we build AIN-ResBlcok with two convolutions and two AIN modules, which is presented in Fig. 3. The AIN module affine transforms normalized feature map of convolution by taking a conditional input where denotes the spatial size of feature map at each scale , and is the number of channels. Specifically, the AIN module produces affine transform parameters such as scale () and shift () for each pixel. Thus, every feature map is channel-wisely normalized and pixel-wisely affine transformed according to the noise level. The update process of feature map in AIN module at site (, , ) is formally represented as

(2)

where the variables with superscript * are generated from , and and

denote the mean and standard deviation of

respectively, in channel . Precisely,

(3)
(4)

where denotes the stability parameter, which prevents divide-by-zero in eq. (2), and we set in our implementation. Note that and can be generated pixel-wisely and thus the proposed method can process spatially variant noisy images adaptively.

3.2 Transfer Learning

Figure 4: Illustration of the proposed transfer learning scheme. AIN module, noise level estimator, and last convolution are only updated when learning RN data. For the better visualization, we omit the noise level estimator in this figure.

We propose transfer learning scheme to leverage to accelerate the training of RN denoiser with that has a limited number of elements (RN pairs). We expect that SN denoiser learns general and invariant feature representations and RN-denoiser learns noise characteristics that cannot be fully modeled from SN data. The proposed transfer learning scheme can achieve these two merits by adapting SN denoiser to RN denoiser. For this, we focus on normalization parameter to handle different data distribution, which is inspired from other style transfer and classification tasks [41, 19, 37]

. In these methods, transforming normalization parameters can transfer different style domain, and different domain classifications can be handled by switching the batch normalization parameters. From these observations, we try to adapt different domain denoisers by transfer-learning the normalization parameters assuming that data discrepancy between

and can be adapted by re-transforming the normalized feature maps.

Specifically, AIN parameters of SN denoiser can be adapted pixel-wisely with conditional . Thus, AIN modules and noise level estimator are transfer-learned with RN data. Although the objective function of noise level cannot be present in , noise level estimator can be trained with the reconstruction loss. We consider that last convolution plays a crucial role reconstructing feature maps to RGB image, hence last convolution is also updated. The overall proposed transfer learning scheme is presented in Fig. 4.

Since the proposed transfer learning scheme only updates the parts of well generalized denoiser, it can be converged with faster speed and get better performance with very few number of elements from than training from scratch. Moreover, the proposed scheme effectively copes with multiple models, which are inevitably required due to severely different noise statics, saving lots of memory by switching specific parameters.

Training

For training SN denoiser, we exploit multi-scale asymmetric loss as an estimation loss where asymmetric loss is introduced from CBDNet [17] to prevent under estimation. Formally, multi-scale asymmetric loss is defined as,

(5)

where , , and

denote element-wise operations such as indicator function, multiplication, and power respectively. Hyperparameters

are empirically determined as . is achieved from average pooling .

Then, the proposed SN denoiser is jointly trained with estimation loss and reconstruction loss as,

(6)

where denotes the SN denoiser training parameter including noise level estimator and reconstruction network. denotes the weight term of noise level estimator and is empirically determined to 0.05.

For the RN denoiser, it is only trained with reconstruction loss:

(7)

where denotes the RN denoiser training parameter that is transferred from . Previously stated parameter such as AIN modules, estimator, and last convolution are only updated, and other parameters are fixed when training the RN denoiser. We use Adam optimizer for both SN denoiser and RN denoiser.

4 Experiments

We present the results of AWGN and RN images by training a Gaussian denoiser and RN denoiser.

4.1 Experimental Setup

Training Settings

For the Gaussian denoiser, the training images are obtained from DIV2K [40] and BSD400 [33], and noisy image is generated by AWGN model. For the RN denoiser, we train a denoiser with two step: training an SN denoiser and training an RN denoiser by transfer learning. We achieve pairs of SN images and noise-free images from Waterloo dataset [31] with heteroscedastic Gaussian noise model and simulating in-camera pipelines. The RN denoiser, which is transferred from SN denoiser, is trained with SIDD training set [2]. All the training images are cropped into patches of size .

Test Set

In the AWGN experiments, we evaluate Set12 [48] and BSD68 [39] that are widely used for validating the AWGN denoiser. Furthermore, we adopt three datasets for real-world noisy images:

  • RNI15 [24] is composed of 15 real-world noisy images. Unfortunately, the ground-truth clean images are unavailable, therefore we only present qualitative results.

  • DND [38] provides 50 noisy images that are captured by mirrorless cameras. Since we cannot access near noise-free counterparts, the objective results (PSNR/SSIM) can be achieved by submitting the denoised images to DND site.

  • SIDD [2] is obtained from smartphone cameras. It provides 320 pairs of noisy images and corresponding near noise-free ones for the learning based methods where the captured scenes are mostly static. Furthermore, it provides 1280 patches for validation that has similar scenes with training set. The quantitative results (PSNR/SSIM) can be achieved by uploading the denoised image to SIDD site.

4.2 Comparison with state-of-the-arts

width=0.6 Method FCN Ours (, ) MAE STD MAE STD (0.08, 0.02) 0.039 0.013 0.014 0.012 (0.08, 0.04) 0.059 0.014 0.012 0.011 (0.08, 0.06) 0.076 0.013 0.020 0.010 (0.12, 0.02) 0.052 0.021 0.015 0.014 (0.12, 0.04) 0.071 0.020 0.017 0.014 (0.12, 0.06) 0.087 0.020 0.030 0.014 Average 0.064 0.017 0.018 0.013 # params 29.5 K 29.7 K

Table 1: Average MAE and error STD for the images from Kodak24 where the inputs are corrupted by heteroscedastic Gaussian including in-camera pipeline.

Noise Level Estimation

We evaluate an accuracy of noise level estimator on exploited noise model images. We compare the proposed noise level estimator with fully convolutional network (FCN) that are widely used [17, 7]. In order to evaluate the accuracy of estimator itself, each estimator is trained with regression. The employed quantitative measurements are mean absolute error (MAE) and standard deviation (STD) of the error. We report the accuracy of each estimator in Table 1 where the input images are simultaneously corrupted with signal dependent noise level and signal independent noise level . We can find that proposed estimator gets more accurate results than previous estimator with a similar number of parameters. The results of more various noise levels will be presented in supplementary file. Furthermore, we will present the denoising performance when combined with reconstruction network, in ablation study.

width = 0.85 Test Set Set12 BSD68 Method 15 25 50 15 25 50 BM3D [12] 32.38 29.95 26.70 31.07 28.56 25.62 TNRD [11] 32.50 30.04 26.78 31.42 28.91 25.96 DnCNN [48] 32.68 30.36 27.21 31.61 29.16 26.23 UNLNet [26] 32.67 30.25 27.04 31.47 28.98 26.04 FFDNET [50] 32.75 30.43 27.32 31.63 29.19 26.29 RIDNet [6] 32.91 30.60 27.43 31.81 29.34 26.40 AINDNet 32.92 30.61 27.51 31.69 29.26 26.32

Table 2: Average PSNR of the denoised images, where the inputs are corrupted by AWGN with , and , for the images from Set12 and BSD68 datasets. (red: the best result, blue: the second best)

AWGN Denoising

We compare proposed denoiser on the noisy grayscale images that are corrupted by AWGN. For this, we train Gaussian denoiser in a single network that learns noise level in [0,60]. The comparisons between the proposed method and other methods are presented in Table 2. We can see that the proposed denoiser achieves the best performance on Set12 where composition of Set12 is independent from training sets. On the other hand, the proposed method gets second best performance on BSD68 that consists of similar objects in BSD400 (training set). We think these results present robust generalization ability of the proposed denoising architecture for training set.

Real Noise Denoising

We also investigate proposed denoiser and transfer learning scheme on RN datasets. Processing RN image is considered very practical, but difficult, because the noises are signal dependent, spatially variant, and visualized diversely according to different in-camera pipelines. Thus, we think RN denoising is an appropriate task for showing the generalization ability of the proposed denoiser and the effects of the proposed transfer learning.

For the precise comparison, we train four different denoisers according to training sets and learning methods:

  • AINDNet(S): AINDNet is trained with SN images, which is proposed SN denoiser.

  • AINDNet(R): AINDNet is trained with RN images.

  • AINDNet+RT: All the parameters from AINDNet(S) are re-trained with RN images, which is common transfer learning scheme.

  • AINDNet+TF: AIN parameters from AINDNet(S) are updated with RN images, which is proposed RN denoiser.

Moreover, we present the geometric ensemble results denoting super script in order to maximize potential performance of the proposed methods.

Meanwhile, there have been a challenge on real image denoising [3] where the SIDD is used. Our method shows lower performance than the top-ranked ones in the challenge, but it needs to be noted that the number of parameters of our network is much smaller than those in the challenge. For example, DHDN [36] and DIDN [47] that appeared in the challenge require about 160 M and 190 M training parameters respectively which are about 12 - 15 times larger than ours. Moreover, challenge methods have been slightly overfitted to SIDD where the winning denoiser [20] gets comparably lower performance (38.78 dB) on DND than our method. Therefore, we would not directly compare the proposed method with challenge methods.

The comparisons, including internal comparisons, are presented in Table 3 and 4. We can find that proposed methods get the best performance on DND and SIDD benchmarks. Specifically, the proposed AINDNet(S) achieves the best performance on DND benchmark, which is impressive performance that outperforms RN trained denoisers. Moreover, AINDNet(S) gets 1.5 dB and 2.4 dB gains from CBDNet on DND and SIDD respectively where employed noise models are the same. These results indicate that the proposed denoiser is not overfitted to noise model and can be well generalized to RN images. However, AINDNet(S) has inferior performance than AINDNet(R) on SIDD with big margin. The main reason is that AINDNet(R) is solely trained with SIDD training images where test set consists of similar scenes and objects in training set. In other words, AINDNet(R) can be slightly overfitted to SIDD benchmark and this phenomenon can be seen from insufficient performance on DND.

In contrast, AINDNet+RT and AINDNet+TF get satisfying performance on both DND and SIDD. Concretely, AINDNet+RT and AINDNet+TF have better performance than others, including AINDNet(R) on SIDD, which indicates that pre-training the SN images results in better performance. AINDNet+TF more likely preserves priorly learned knowledges from SN data than AINDNet+RT, so AINDNet+TF achieves the best overall performance among compared methods.

We present visualized comparisons on SIDD and RNI15 in Figs. 6 and 6, which show that proposed methods remove noises robustly while preserving the edges. Thus, characters in output images are more apparent than in other methods’ results. Furthermore, we also present visual enhancement in Fig. 7 when the proposed transfer learning scheme is applied. Since RN denoiser transfer-learns characteristics of RN, AINDNET+TF successfully removes unusual noise that cannot be removed with AINDNET(S). Moreover, RN denoiser learns the properties of JPEG compression artifacts that is not priorly learned in SN denoiser, so it can also successfully reduces compression artifacts. We will also present other visualized comparisons in supplementary file.

width=0.95 Method Blind/Non-blind Training Env. PSNR SSIM CDnCNN-B [48] Blind Synthetic 32.43 0.7900 TNRD [11] Non-blind Synthetic 33.65 0.8306 MLP [8] Non-blind Synthetic 34.23 0.8331 FFDNet [50] Non-blind Synthetic 34.40 0.8474 BM3D [12] Non-blind - 34.51 0.8507 WNNM [16] Non-blind - 34.67 0.8646 GCBD [10] Blind Synthetic 35.58 0.9217 KSVD [4] Non-blind - 36.49 0.8978 TWSC [45] Blind - 37.94 0.9403 CBDNet [17] Blind Synthetic 37.57 0.9360 CBDNet [17] Blind Real 37.72 0.9408 CBDNet [17] Blind All 38.06 0.9421 RIDNet [6] Blind Real 39.23 0.9526 AINDNet Blind Synthetic 39.53 0.9561 AINDNet Blind Real 39.16 0.9515 AIDNet + RT Blind All 39.21 0.9505 AINDNet + TL Blind All 39.37 0.9505 AINDNet Blind Synthetic 39.77 0.9590 AINDNet Blind Real 39.34 0.9524 AINDNet + RT Blind All 39.34 0.9522 AINDNet + TL Blind All 39.52 0.9522

Table 3: Average PSNR of the denoised images on the DND benchmark, we denote the environment of training, i.e., training with SN data only, RN data only, and both. (red: the best result, blue: the second best)

width=0.9 Method Blind/Non-blind Training Env. PSNR SSIM CDnCNN-B [48] Blind Synthetic 23.66 0.583 MLP [8] Non-blind Synthetic 24.71 0.641 TNRD [11] Non-blind Synthetic 24.73 0.643 BM3D [12] Non-blind - 25.65 0.685 WNNM [16] Non-blind - 25.78 0.809 KSVD [4] Non-blind - 26.88 0.842 CBDNet [17] Blind All 33.28 0.868 AINDNet Blind Synthetic 35.66 0.903 AINDNet Blind Real 38.73 0.950 AIDNet + RT Blind All 39.04 0.955 AINDNet + TL Blind All 38.95 0.952 AINDNet Blind Synthetic 35.87 0.905 AINDNet Blind Real 38.84 0.951 AINDNet + RT Blind All 39.15 0.955 AINDNet + TL Blind All 39.08 0.953

Table 4: Average PSNR of the denoised images on the SIDD benchmark, we denote the environment of training, i.e., training with SN data only, RN data only, and both. (red: the best result, blue: the second best)

width=0.7 Num of Real Images 0 1 2 4 8 16 32 64 320 (full) RIDNet - - - - - - - - 38.71 AINDNet(R) - 30.36 32.19 36.94 37.70 38.14 38.66 38.70 38.81 AINDNet+RT 35.21 36.23 37.16 38.02 38.40 38.63 38.82 39.00 39.01 AINDNet+TF 35.21 36.19 37.14 37.93 38.27 38.52 38.75 38.83 38.90

Table 5: Investigation of denoiser RN denoising performance according to the amount of RN dataset. The quantitative results (in average PSNR (dB)) are reported on SIDD validation dataset.
(a) Noisy Image
(b) DnCNN-C
(c) CBDNet
(d) RIDNet
(e) AINDNET(S)
(f) AINDNET+TF
(a) Noisy Image
(b) DnCNN-C
(c) CBDNet
(d) RIDNet
(e) AINDNET(S)
(f) AINDNET+TF
Figure 5: The real noisy image from SIDD, and the comparison of the results.
Figure 6: The real noisy image from RNI15, and the comparison of the results.
Figure 5: The real noisy image from SIDD, and the comparison of the results.
(a) Noisy Image
(b) AINDNET(S)
(c) AINDNET+TF
(d) Noisy Image
(e) AINDNET(S)
(f) AINDNET+TF
Figure 7: The real noisy image from RNI15, and the comparison of the results showing the effectiveness of the proposed transfer learning scheme.

4.3 Discussions

Effect of Transfer Learning with Limited RN Pairs

We investigate the relation between denoising performance and the amount of RN image pairs in , because we consider that preparation of is quite difficult and the number of elements can also be limited. For this, we train each network with constrained image pairs from one to all (320) from SIDD [2]. The average PSNR of each denoiser is presented in Table 5. It can be seen that transfer learning schemes can infer significant performance with the small number of real training images. It is notable that AINDNet+TF trained with 32 pairs of real data achieves better performance than RIDNet that exploits all. Thus, we can conclude that the transfer learning with SN denoiser dramatically accelerate the performance with a small number of labeled data from other domain.

Architecture of Denoiser

We demonstrate the effectiveness of noise level estimator and reconstruction network for training with . We present performance of noise level estimators combined with reconstruction network in Table 6 with different objective function. Remember that can generate smoothed outputs, so is excluded when using . We find that state-of-the-art training scheme (FCN + + ) infers inferior performance than proposed training scheme (Ours + ). Moreover, the proposed training scheme also surpasses internal variation (Ours + + ).

We further investigate the effectiveness of reconstruction network. For this, we select an adaptive Gaussian denoiser that can process spatially variant noise map by feeding gated-residual block (Gated-ResBlock). Since it has not reported the performance on RN dataset, we train SN-denoiser by replacing AIN-ResBlock to Gated-Resblock where other settings are same as AINDNet. Table 6 shows that the proposed AIN-ResBlock shows better performance on RN datasets. Thus, we believe that the AIN-ResBlock is an appropriate architecture for the generalization.

width=0.6 Method DND SIDD FCN + + 39.51 34.90 Ours + + 39.45 35.08 Ours + 39.53 35.19

Table 6: Investigation of noise level estimator and estimation loss when denoisers are trained with SN data. The quantitative results (in average PSNR (dB)) are reported on DND test dataset and SIDD validation dataset.

width=0.55 Method DND SIDD Res-Block [22] 39.19 34.93 Ours 39.53 35.19

Table 7: Investigation of the proposed reconstruction network when denoisers are trained with SN data. The quantitative results (in average PSNR (dB)) are reported on DND test dataset and SIDD validation dataset.

5 Conclusion

In this paper, we have presented a novel denoiser and transfer learning scheme of RN denoising. The proposed denoiser employs an AIN to regularize the network and also to prevent the network from overfitting to SN. The transfer learning mainly updates the AIN module using RN data to adjust data distribution. From the experimental results, we could find that the proposed denoising scheme can be well generalized to RN even if it is trained with SN. Moreover, the transfer learning scheme can effectively adapt an SN denoiser to an RN denoiser, with very few additional training with real noise pairs. We will make our code publicly open for further research and comparison.

References

  • [1] Abdelrahman Abdelhamed, Marcus A Brubaker, and Michael S Brown. Noise flow: Noise modeling with conditional normalizing flows. In

    Proceedings of the IEEE International Conference on Computer Vision

    , pages 3165–3173, 2019.
  • [2] Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2018.
  • [3] Abdelrahman Abdelhamed, Radu Timofte, and Michael S Brown. Ntire 2019 challenge on real image denoising: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  • [4] Michal Aharon, Michael Elad, and Alfred Bruckstein. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing, 54(11):4311–4322, 2006.
  • [5] Josue Anaya and Adrian Barbu. Renoir–a dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51:144–154, 2018.
  • [6] Saeed Anwar and Nick Barnes. Real image denoising with feature attention. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
  • [7] Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron. Unprocessing images for learned raw denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  • [8] Harold C Burger, Christian J Schuler, and Stefan Harmeling. Image denoising: Can plain neural networks compete with bm3d? In 2012 IEEE conference on computer vision and pattern recognition, 2012.
  • [9] Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. arXiv preprint arXiv:1904.00523, 2019.
  • [10] Jingwen Chen, Jiawei Chen, Hongyang Chao, and Ming Yang. Image blind denoising with generative adversarial network based noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  • [11] Yunjin Chen and Thomas Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence, 39(6):1256–1272, 2016.
  • [12] Kostadin Dabov, Alessandro , Vladimir Katkovnik, and Karen Egiazarian. Color image denoising via sparse 3d collaborative filtering with grouping constraint in luminance-chrominance space. In 2007 IEEE International Conference on Image Processing, 2007.
  • [13] Ruicheng Feng, Jinjin Gu, Yu Qiao, and Chao Dong.

    Suppressing model overfitting for image super-resolution networks.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.
  • [14] Alessandro Foi, Mejdi Trimeche, Vladimir Katkovnik, and Karen Egiazarian. Practical poissonian-gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing, 17(10):1737–1754, 2008.
  • [15] Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. Deep joint demosaicking and denoising. ACM Transactions on Graphics (TOG), 35(6):191, 2016.
  • [16] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
  • [17] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei Zhang. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  • [18] Samuel W Hasinoff, Frédo Durand, and William T Freeman. Noise-optimal capture for high dynamic range photography. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 553–560. IEEE, 2010.
  • [19] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pages 1501–1510, 2017.
  • [20] Dong-Wook Kim, Jae Ryun Chung, and Seung-Won Jung. Grdn: Grouped residual dense network for real image denoising and gan-based real-world noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  • [21] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
  • [22] Yoonsik Kim, Jae Woong Soh, and Nam Ik Cho. Adaptively tuning a convolutional neural network by gate process for image denoising. IEEE Access, 7:63447–63456, 2019.
  • [23] Filippos Kokkinos and Stamatis Lefkimmiatis. Iterative residual cnns for burst photography applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5929–5938, 2019.
  • [24] Marc Lebrun, Miguel Colom, and Jean-Michel Morel. The noise clinic: a blind image denoising algorithm. Image Processing On Line, 5:1–54, 2015.
  • [25] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  • [26] Stamatios Lefkimmiatis. Universal denoising networks: a novel cnn architecture for image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3204–3213, 2018.
  • [27] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. Noise2noise: Learning image restoration without clean data. In

    International Conference on Machine Learning

    , pages 2971–2980, 2018.
  • [28] Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. In Advances in neural information processing systems, pages 386–396, 2017.
  • [29] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017.
  • [30] Xinhao Liu, Masayuki Tanaka, and Masatoshi Okutomi. Practical signal-dependent noise parameter estimation from a single noisy image. IEEE Transactions on Image Processing, 23(10):4361–4371, 2014.
  • [31] Kede Ma, Zhengfang Duanmu, Qingbo Wu, Zhou Wang, Hongwei Yong, Hongliang Li, and Lei Zhang. Waterloo exploration database: New challenges for image quality assessment models. IEEE Transactions on Image Processing, 26(2):1004–1016, 2016.
  • [32] Ilja Manakov, Markus Rohm, Christoph Kern, Benedikt Schworm, Karsten Kortuem, and Volker Tresp. Noise as domain shift: Denoising medical images by unpaired image translation. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data, pages 3–10. Springer, 2019.
  • [33] David Martin, Charless Fowlkes, Doron Tal, Jitendra Malik, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In The IEEE International Conference on Computer Vision (ICCV), 2001.
  • [34] Seonghyeon Nam, Youngbae Hwang, Yasuyuki Matsushita, and Seon Joo Kim. A holistic approach to cross-channel image noise modeling and its application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1683–1691, 2016.
  • [35] Alberto Ortiz and Gabriel Oliver. Radiometric calibration of ccd sensors: Dark current and fixed pattern noise estimation. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 5, pages 4730–4735. IEEE, 2004.
  • [36] Bumjun Park, Songhyun Yu, and Jechang Jeong. Densely connected hierarchical network for image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  • [37] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2337–2346, 2019.
  • [38] Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  • [39] Stefan Roth and Michael J Black. Fields of experts. International Journal of Computer Vision, 82(2):205, 2009.
  • [40] Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 114–125, 2017.
  • [41] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
  • [42] Lei Xiao, Felix Heide, Wolfgang Heidrich, Bernhard Schölkopf, and Michael Hirsch. Discriminative transfer learning for general image restoration. IEEE Transactions on Image Processing, 27(8):4091–4104, 2018.
  • [43] Jun Xu, Hui Li, Zhetong Liang, David Zhang, and Lei Zhang. Real-world noisy image denoising: A new benchmark. arXiv preprint arXiv:1804.02603, 2018.
  • [44] Jun Xu, Lei Zhang, and David Zhang. External prior guided internal prior learning for real-world noisy image denoising. IEEE Transactions on Image Processing, 27(6):2996–3010, 2018.
  • [45] Jun Xu, Lei Zhang, and David Zhang. A trilateral weighted sparse coding scheme for real-world image denoising. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  • [46] Jun Xu, Lei Zhang, David Zhang, and Xiangchu Feng. Multi-channel weighted nuclear norm minimization for real color image denoising. In Proceedings of the IEEE International Conference on Computer Vision, pages 1096–1104, 2017.
  • [47] Songhyun Yu, Bumjun Park, and Jechang Jeong. Deep iterative down-up cnn for image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  • [48] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • [49] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3929–3938, 2017.
  • [50] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing, 27(9):4608–4622, 2018.
  • [51] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3262–3271, 2018.
  • [52] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  • [53] Fengyuan Zhu, Guangyong Chen, and Pheng-Ann Heng. From noise modeling to blind image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 420–429, 2016.