Unsupervised Image Noise Modeling with Self-Consistent GAN

by   Hanshu Yan, et al.
National University of Singapore

Noise modeling lies in the heart of many image processing tasks. However, existing deep learning methods for noise modeling generally require clean and noisy image pairs for model training; these image pairs are difficult to obtain in many realistic scenarios. To ameliorate this problem, we propose a self-consistent GAN (SCGAN), that can directly extract noise maps from noisy images, thus enabling unsupervised noise modeling. In particular, the SCGAN introduces three novel self-consistent constraints that are complementary to one another, viz.: the noise model should produce a zero response over a clean input; the noise model should return the same output when fed with a specific pure noise input; and the noise model also should re-extract a pure noise map if the map is added to a clean image. These three constraints are simple yet effective. They jointly facilitate unsupervised learning of a noise model for various noise types. To demonstrate its wide applicability, we deploy the SCGAN on three image processing tasks including blind image denoising, rain streak removal, and noisy image super-resolution. The results demonstrate the effectiveness and superiority of our method over the state-of-the-art methods on a variety of benchmark datasets, even though the noise types vary significantly and paired clean images are not available.


page 3

page 5

page 6

page 8


Unpaired Learning of Deep Image Denoising

We investigate the task of learning blind image denoising networks from ...

GAN2GAN: Generative Noise Learning for Blind Image Denoising with Single Noisy Images

We tackle a challenging blind image denoising problem, in which only sin...

Unsupervised Image Restoration Using Partially Linear Denoisers

Deep neural network based methods are the state of the art in various im...

Noise2Sim – Similarity-based Self-Learning for Image Denoising

The key idea behind denoising methods is to perform a mean/averaging ope...

End-to-End Unsupervised Document Image Blind Denoising

Removing noise from scanned pages is a vital step before their submissio...

Learning to Clean: A GAN Perspective

In the big data era, the impetus to digitize the vast reservoirs of data...

Convolutional Bipartite Attractor Networks

In human perception and cognition, the fundamental operation that brains...

1 Introduction

Image restoration and enhancement [22]

, which focus on generating high-quality images from their degraded versions, are important image processing tasks and useful for many computer vision applications. Noise modeling is critical to such tasks including image denoising 

[3, 25], noisy image super resolution (SR) [6, 21, 24], and rain streak removal [23, 4], etc..

Deep learning methods have achieved astounding performances on various noisy image enhancement tasks [25, 26, 4]

. Most of these methods are designed to be supervised learning-based, and assume that noisy images together with their corresponding clean versions are available. However, in many realistic applications, it is impossible or inefficient to collect a large quantity of (clean, noise) paired images. For example, for a noisy wild natural image captured via a fixed camera, it is not possible to obtain the real clean image because of the variance of the light and objects in the scene. Thus, supervised learning-based deep models are not applicable. In contrast, it is easy to collect from the Internet unpaired clean images, which contain different contents from the given noisy images. Even though there are no image pairs, the noisy and clean images with different contents can still together reflect the domain noise. Thus, learning a model to generate the noise is feasible.

Figure 1: Our proposed SCGAN model learns to extract the noise map from a given noisy image

and generates the estimated clean image. Adding the noise map to another clean image

, we can obtain ’s noisy version , which shares similar noise patterns with . With the formed (noisy, clean) image pairs, like , we can train a deep model in an end-to-end manner for a specific noisy image processing task.

By using unpaired images, our two-step method is able to address the noisy image processing problem. In the first step, also the more crucial step as shown in Fig. 1, we learn to model the noise in the given noisy images, so that we can extract the noise maps from noisy images. Thus, image pairs can be constructed by adding the extracted noise maps to other collected clean images. In the second step, we train a deep model with the constructed paired image set for certain task. To achieve this, we design an unsupervised deep noise modeling network, named Self-Consistent Generative Adversarial Network (SCGAN). Given a collection of noisy images, SCGAN learns to extract their noise maps and generates their clean versions. The obtained noise model can be applied to process images with similar noise conditions.

Because of the absence of a paired dataset, the training of a GAN model is severely under-constrained. To ameliorate this, SCGAN introduces three novel self-consistent constraints for complementing the adversarial learning and facilitating the model training. The three constraints are developed based on the following observations. 1) A good noise model should map a clean image to the zero response; 2) A good noise model should return the same output when fed with a pure noise map. As a result, if we fix the input to be a noise map extracted from a well-trained model, the resulting output should be the same as the input noise map; 3) If we add pure noise to a clean image, a good model should extract the same noise from the resultant noisy image. The above three constraints are complementary to one another and the adversarial loss. They are easy to be deployed in end-to-end deep model training.

We apply the proposed SCGAN model to three noisy image processing tasks, each of which features different challenges and noise types. These three tasks include blind denoising, rain streak removal, and noisy image SR. Our unsupervised model achieves excellent performances that are even comparable with fully-supervised trained models. The main contributions of this work are as follows: (1) We propose a new architecture for unsupervised noise modeling; (2) We introduce three self-consistent losses to improve the training procedure and the performance of the model. (3) To the best of our knowledge, this is the first work to perform unsupervised rain streak removal and noisy image SR via noise modeling.

Figure 2: Our model consists of two networks, namely, a generator for noise map extraction and a discriminator for distinguishing real clean images from fake ones. To overcome the problem of having unpaired sets, we introduce three self-consistent losses. These include the clean consistent loss which enforces that , the pure noise consistent loss which enforces that , and the reconstruction consistent loss which enforces that .

2 Related Work

Deep Learning Methods for Image Restoration

Deep learning methods have demonstrated great successes in image restoration tasks, including image denoising [25, 26, 10], rain streak removal [23, 4] and image SR [12, 20, 9]. For the image denoising task, DnCNN [25] and IRCNN [26]

train deep neural networks with residual blocks to learn mappings from noisy images to their corresponding residual noise maps. For the rain streak removal task, Yang

et al. [23] and Fu et al. [4] proposed a multi-task deep architecture that learns the binary rain streak map to help recover clean images. For image SR tasks, deep models are usually trained with high resolution (HR) images and their corresponding downsampled LR images. EDSR [12] combines residual blocks and pixels shuffling in its model and achieves excellent performances on both the bicubic down-sampling and unknown degradation tracks. However, the training in all of these works is done in a supervised manner with paired noisy images and their corresponding ground truth images. Hence, they cannot be directly used for noisy image processing tasks in the scenario that paired data is absent.

Image Blind Denoising

When paired data is absent, blind denoising methods are proposed to perform noise removal tasks via noise modeling techniques. Classical methods for blind image denoising usually include the noise estimation/modeling [28, 13, 27, 16] and adaptive denoising procedures. Liu et al. [13] proposed a patch-based noise level estimation method to estimate the parameters of Gaussian noise. Zhu et al. [28] used mixtures of Gaussian (MoG) to model image noises and recover clear signals with a low-rank MoG filter. These methods were developed based on human knowledge and cannot effectively handle images with unknown noise statistics. To the best of our knowledge, deep learning-based methods for noise modeling [1] are scarce. In Chen et al. [1], the authors proposed a smooth patch searching algorithm to obtain noise-dominant data from noisy images. These noise-dominant patches are used to train a GAN, which can model the distribution of the noise and generate noise patches. However, the method requires manually tuned parameters to search for noise blocks, and the performance of noise modeling is sensitive to the parameters. Besides, the proposed noise-dominant patch search method in [1] can only model noises with zero-mean. When the mean of the additive noise, such as rain streaks, is positive, the search method cannot extract pure rain streak patches to train a GAN for noise modeling. By contrast, in our method, we do not make any specific assumption on the additive noise. Our method does not involve searching smooth patches from noisy images, but attempts to train a deep network which directly extracts noise maps from noisy images.

3 Approach

3.1 Problem Setting

We aim to solve image noise modeling problems in an unsupervised manner. In particular, we are only provided with a set of noisy images and another set of clean images . The degradation of a noisy image is usually modeled as  [1, 13, 23], where and represent a noisy image and its ground truth respectively, and is a noise map. We aim to obtain a noise model that can generate the accurate noise map for a noisy input image.

3.2 Noise Modeling with Self-Consistent GAN

GANs have been shown to be effective in modeling complex real data distributions from a large number of sampled inputs [17, 18]. A GAN model consists of a generator component and a discriminator component . In the noise modeling problems, we can train a GAN on the unpaired clean and noisy images to obtain a noise model that maps a noisy image to the noise map, i.e., .

In Fig. 2, given a noisy image as the input, the noise modeling network will output the estimated noise map . The clean version of the input patch can be estimated as . By adding the extracted noise map to some clean patch , we can then generate its noisy version , which shares the same noise pattern as the given noisy patch . The estimated clean version of a noisy input will then be sent to a discriminator , together with the real clean image . We use an adversarial training strategy to train and .

To ensure that the training procedure is stable, the least squares loss [24] is used as the adversarial loss. The objective function of adversarial training is:


We train the discriminator to maximize this objective. At the same time, we optimize to minimize the loss such that can generate noisy maps presenting similar noise as . Namely, we consider

The above optimization will achieve a Nash equilibrium and the optimized can be deployed as a noise model for new inputs directly.

Though the above application of vanilla GAN for noise modeling is straightforward, we find that if we only use the adversarial loss, the training of noise modeling is severely under-constrained because of the absence of paired image patches. To ameliorate this problem, we carefully study the generator . We derive three implicit constraints, which we term as self-consistent losses (Fig. 2), from inherent properties of the noise modeling mapping.

  1. The first intuitive property shown in Eqn. (2) is that, taking input as a noise-free clean image , the generator should output zero response:

  2. We term an image, that only contains noise sampled from the noise distribution, as a pure noise image. The second property is that a pure noise image, after passing through the noise model , should produce the same pure noise map. However, since pure noise images are not available in the training set, we propose another constraint (Eqn. (3)) as an alternative to enforce such a desired property. When training the model, given a noisy image , the output will be fed back to . The is optimized to minimize the difference between and as below:

  3. The third property is exactly the definition of noise modeling: for any noisy image which is the addition of a clean image and a pure noise map, the pure noise map should be extracted from the noisy image correctly by . As shown in Eqn. (4), added by a clean image , the extracted noise is fed back to again. should be able to reconstruct the noise map. This is dictated by the following constraint:


By incorporating the self-consistent implicit constraints in Eqns. (2)–(4), our overall objective is


where , and are non-negative weights.

3.3 Architecture and Training Details of SCGAN


In Fig. 2, our generator uses the same architecture as the model in [25]. The first layer of

is a convolutional (conv) layer with ReLU activation, and the last layer is merely a conv layer. Each of the remaining

layers is a unit consisting of a conv layer with batch normalization and ReLU activation. Both the numbers of input and output channels of the conv layers in the middle 15 units are set to

, and the kernel size is set as

. To ensure that the output noise maps have the same size as the inputs and to avoid artifacts along the edges, noisy input images are padded using the reflection method, and the padding number is set to

. The discriminator only has conv layers. The first

conv layers are connected with a LeakyReLU activation function, where the negative slope is set to be

. The number of output channels of the conv layers are , , , and respectively. The kernel sizes are , , , and

and the strides are

, , , and . All the padding numbers of each layer are set to .

Training details

In order to well train a noise map extraction model, we divide the whole training procedure into three phases based on training epochs [

ep1, ep2, ep3] and accordingly schedule the changing of weight parameters.

In the first phase during epochs 0 to ep1, the values of , and are initialized as zero, thus only GAN loss is used for optimization. After several epochs of training, can extract noise maps from the noisy input and generate estimated clean images. However, the recovered clean images usually present distortions and brightness-shifts. This is because wrongly treats the background and textures in images as noise and extracts them into the outputs.

The second training phase, during epochs ep1 to ep2, is dedicated to this problem. The values of , increase to preset values and the value of keeps being zero. Thus, and starts influencing the optimization. With the help of these two constraints, tends to extract zero noise map from the real clean images. also extracts the identity map for a pure noise image. At the end of this phase, the estimated clean images are free from distortion and retain similar brightness and contrast as the noisy input images. However, the extracted noise maps still contain distinct edges of noisy input images.

To overcome this problem, in the third phase during epochs ep2 to ep3, the value of increases to positive and is added to the objective function. guarantees that a pure noise map can be reconstructed/re-extracted from the addition of any clean image and the pure noise map. Hence, the noise maps that are extracted by will be free from the influence of edges and textures of a certain image and a certain area in an image. In Section 4.4, we analyze the effectiveness of the proposed self-consistent losses.

3.4 Application to Noisy Image Restoration Tasks

Figure 3: The proposed two step method for noisy image processing. In step one, the SCGAN learns to extract noise maps from given noisy images, and add the noise maps into other clean images to construct image pairs; In step two, with the formed image pairs, a deep model for certain task can be trained in an end-to-end manner.
(a) Noisy
(b) Ground-truth
(c) BM3D
(d) GCBD-2
(e) SCGAN-2
(f) DnCNN-B(Oracle)
Figure 4: Comparison of denoising performance of different methods for the Image ‘003’ from BSD68; noise level .
15 31.07 31.37 30.59 30.80 31.35 31.48 31.61
25 28.57 28.83 27.66 28.92 29.04 29.02 29.16
Table 1: The average PSNR (in dB) results of Gaussian noise removal on the BSD68 dataset. The BM3D and WNNM are two non-blind denoising methods, while rest are blind denoising methods.

We apply the proposed SCGAN for blind image denoising, rain streak removal and noisy image SR (Fig. 3). For the first two tasks, the available training data only contains a set of noisy images with unknown noise statistics and a set of different clean images . Using a well-trained generator , we can extract noise maps from , obtain the noisy version of and construct a paired training dataset . We trained a DnCNN [25] model for denoising and a Deep Detail Network [4] for rain streak removal.

For the noisy image SR task, we only have a set of noisy LR images and a set of clean HR images . We assume that the up-scaling factor is and the bicubic down-sampling kernel is used. Firstly, we down-sample by the factor to get a LR clean image set . Then, following similar procedures as those mentioned in the denoising task, we generate the noisy versions of clean LR images and construct a paired training set . EDSR shows impressive performances on benchmarks and is the best model for the NTIRE2017 Super-Resolution Challenge [22]. We finally train an EDSR-Baseline [12] model with the constructed set for noisy image SR task.

4 Experiments

4.1 Image Blind Denosing


We evaluate the performance of SCGAN for blind image denoising on the BSD68 [19] benchmark dataset. The BSD68 set consists of 68 images with the resolution of , and the images cover a variety of scenes including animals, human, buildings and natural scenery. We synthesize the noisy testing images by adding Gaussian noise to the BSD68 dataset.


We compare the performance of our proposed SCGAN with state-of-the-art blind denoising methods, including DnCNN-B [25] and GCBD [1], and classical non-blind denoising methods such as BM3D [3] and WNNM [5]. For the DnCNN-B method, it is trained with real Gaussian noisy images in a fully supervised manner, where the noisy images are synthesized by adding noises from the range of to a set of 400-clean images [2] of size . We regard DnCNN-B as the performance upper bound.

The GCBD method [1] shares a similar two-step framework as our method to perform blind denoising. In GCBD, noise blocks are extracted from noisy images to train a GAN for modeling the noise distribution. This is to facilitate forming a dataset of paired images to train a DnCNN model. However, the paper [1] does not provide details about the images used for noise modeling and the set of clean images for DnCNN training. Thus, for a fair comparison, we tried our best to reproduce the GCBD method and tested it on the same datasets that are used for SCGAN.

(a) Rainy
(b) Ground Truth
(c) LP
(d) DSC
Figure 5: Comparison of de-rain performance of different methods for the image ‘22’ and the image ‘28’ from the Rain100H test set. The JORDER- is a deep model trained in a fully supervised way.


We conducted experiments with two settings. In the first setting, the given training noisy images contain a large number of smooth patches. Thus, we collect clean images online, and all the images collected only contain the sky or regions consisting of pure color. We divide these images into two sets: one set of images are added with Gaussian noises of certain intensities and so as to form the noisy image set ; the other set of clean images is used as . We cropped the images of two sets into patches of size and subtracted their means.

In the second setting, the available noisy images for training are constructed by adding Gaussian noise to clean images in the DIV2K [22] dataset. This dataset consists of a diverse set of images, such as people, natural scenes, and handmade objects. We downsampled the images to by a factor of and divided them into two sets, indexing from 1-400 and 401-800 respectively. Next, we added Gaussian noises to the first set and formed the noisy image set . The other set is used as the clean image set .

In both two settings, we firstly train an SCGAN model to extract noise maps from the set . For fair comparisons, in the second step, we trained our DnCNN denoising model on the same dataset as that used in DnCNN-B; thus we added noise maps extracted from into the 400 clean images [2] to construct the paired training set. The parameter settings for training DnCNN are the same as those in [25].


For the first setting, we conducted experiments with and that are formed with the collected 200 images. With the generated paired data, we trained a DnCNN model and tested it on the BSD68 set. The results of SCGAN (SCGAN-1) and GCBD (GCBD-1) are listed in Table 1. It is observed that our method is comparable to GCBD. For , the PSNR of SCGAN-1 is dB, higher than GCBD-1’s dB, and close to the upper bound (using DnCNN-B) of dB. For , our method is worse than GCDB-1 by dB, which is marginal. However, it is still close to the upper bound and better than two non-blind methods, BM3D and WNNM.

For the second setting, we conducted experiments with and that are constructed with images from the DIV2K set. The visual results are shown in Fig. 4, and the PSNR value of our method (SCGAN-2) is still close to the upper bound. For , SCGAN-2 is better than two non-blind methods. To compare with GCBD, we reproduced the GCBD method (GCBD-2) using the same parameter settings in [1] for noise blocks extraction from the DIV2K dataset. For , the result of GCBD-2 is slightly lower than SCGAN-2. However, for , SCGAN outperforms GCBD by almost dB. In the GCBD method, the training of the GAN for noise modeling is sensitive to similarities between extracted noise blocks and real Gaussian noises. However, we observed that the extracted noise blocks from DIV2K mostly contain textures related to the image contents. The manually set parameters result in extracted noise blocks that may not be similar to the real noise. Comparatively, our proposed self-consistent losses can help SCGAN extract noise maps even from non-smooth areas as analyzed in Section 4.4.

4.2 Rain streak Removal

To show that the proposed SCGAN can model more complex noise, we apply it to a rain streak removal task. The degradation of rain streaks can be model as  [7], where the rainy image is regarded as the additive combination of its ground truth with a rain streak map . The mean value of the rain streak map is usually positive. Thus, the smooth patch search method in GCBD [1] cannot be applied, as it violates the critical assumption of zero-mean noise that is used by the GCBD.


We evaluate the proposed SCGAN model on the Rain100H dataset [23], which consists of 1800 pairs in the training set and 200 pairs in the test set. The 1800 rainy images in the training set are used to form the given noise image set . We select 900 clean images from DIV2K dataset, where all the clean images have different contents from the rainy images. The selected 900 clean images form the clean image set .

Methods ID [8] LP [11] DSC [14] SCGAN JORDER- [23]
PSNR 14.02 14.26 15.66 19.865 20.79
Table 2: The average PSNR (in dB) results of rain streak removal on the Rain100H dataset. The JORDER- is a deep model trained in a fully supervised way.


We train an SCGAN model with and , and finally, obtain a model that can efficiently extract rain streak maps from given rainy images. To construct a paired training set, the extracted rain streak maps are added to clean images in . With the image pairs, we further train a deep detailed network (DDN) for rain streak removal in an end-to-end way. The architecture of de-rain model follows the that in [4]. We compare our method with several classical methods, such as an image decomposition method (ID) [8], a layer prior method (LP) [11] and a sparse coding method (DSC) [14]. Besides, we also compare our method with a deep learning method, JORDER- [23], which is trained in a fully supervised way.


The result of our SCGAN method is shown in Table 2 and Fig. 5. As shown in Fig. 5, the SCGAN can efficiently remove the rain streaks. The visual result of SCGAN is much better than three classical methods, i.e., ID, LP and DSC, and close to the fully-supervised method, JORDER-. Quantitatively, the SCGAN distinctly outperforms three classical methods. It is 5.8dB better than the ID [8], 5.6dB better than the LP [11] and 4.2dB better than the DSC [14]. Besides, we also compare our method with the fully supervised JORDER- that is trained with the whole Rain100H training set, and the performance of SCGAN is comparable to that of JORDER- [23]. Our results suggest that the rain streak maps extracted by SCGAN are similar to the real rain streaks contained in the Rain100H set.

(a) Bicubic
(b) Ground truth
(c) EDSR-b
(d) Dn+SR
(f) EDSR* (Oracle)
Figure 6: From top to bottom, SR visual results of an image (0806) from the DIV2K dataset and one image (3096) from the BSD100 dataset with upscaling factor are shown. LR noisy images contain Gaussian noises with .

4.3 Noisy Image Super-Resolution


The third task we evaluate the SCGAN for noise modeling is the super-resolution over Gaussian noisy images. We evaluated the model on two benchmark datasets, namely, the DIV2K [22] validation set and the BSD100 [15] set, and the Gaussian noise is added to these two datasets.

The unpaired image sets used for training SCGAN are formed as the following: The DIV2K training set consists of 800 HR clean images. We downsampled them by applying bicubic interpolation and divided the obtained LR images into two sets, one of which consists of images indexed from

to and the other consists of images indexed from to . We then added Gaussian noises to the first set and kept the other set unchanged. Hence, a noisy image set and a clean image set are obtained.

In this experiment, only images in have corresponding HR images . With the sufficiently well-trained SCGAN, we can synthesize a noisy image set by adding noises extracted from to . Finally, we obtain a paired image set , in which each pair consists of a noisy LR image and a clean HR image.


For performing the noisy image SR using the generated paired dataset, we trained an EDSR-baseline [12] model which contains residual blocks and M parameters. To stabilize the training procedure, we followed the training strategy in [12]. We pre-trained an EDSR-baseline (EDSR-b) model with clean LR and HR pairs . Next, we fine-tuned the pre-trained model with the training data synthesized by SCGAN. The fine-tuned model is our final model.

We regard the performance of the model (EDSR-b) previously trained on clean LR and HR pairs as the lower bound. Besides, we also trained an EDSR-baseline model (EDSR*) on images with real Gaussian noise and regarded its performance as the upper bound. In addition to noise modeling, the noisy image SR task also can be solved using an alternative two-step method, , image denoising followed by SR. As such, we first use the DnCNN-B model to remove noise from test images, and subsequently, we use the EDSR-b to super-resolve the denoised images. The results of this two-step method (Dn+SR) are shown in Table 3. For a fair comparison, the EDSR models in each method are trained with the same hyper-parameters.

Dataset Ratio EDSR-b Han et al. [6] Dn+SR SCGAN EDSR* (Oracle)
2 23.529 - 29.555 29.974 30.795
DIV2K 3 22.691 - 27.499 27.931 28.298
4 21.889 - 26.212 26.583 26.883
2 24.241 27.29 28.469 28.501 28.772
B100 3 23.317 25.64 26.616 26.846 27.010
4 22.592 24.74 25.544 25.870 26.029
Table 3: The average PSNR (in dB) results of noisy image SR on the DIV2K and B100 datasets, .


From the results in Table 3, we observe that the performances of SCGAN are better than the state-of-the-art method [6] and close to the upper bound which is derived from a model trained in a fully supervised way. Besides, compared with the Dn+SR two-step method, the SCGAN not only has better quantitative PSNR result, but also provides a result with better visual quality. That is because, as shown in the Fig. 6, any remaining noise or distortion caused in the denoising step will be amplified during SR.


(a) Net-1

(b) Net-2

(c) Net-3

(d) Net-1

(e) Net-2

(f) Net-3
Figure 7: Given two noisy patches and , the outputs of Net-1, Net-2 and Net-3 are shown above. In each bounding box, if the noisy input patch is , from left to right, there are the extracted noise map of noisy patch , the clean estimation patch and which is the output of taking the extracted noise map as input again.

4.4 Ablation Study

In this subsection, we conduct image blind denoising experiments to investigate effects of the three proposed self-consistent losses, namely the clean consistency , the pure noise consistency and the reconstruction consistency in Eqns. (2)–(4). We compare the performance of three variants of SCGAN, , Net-1, Net-2, and Net-3, that use different self-consistent constraints. The unpaired image set used for this ablation study is the same as the dataset constructed from the DIV2K for the blind denoising experiment(the second setting).

Model Losses Description
Net-1 Only adversarial loss
Net-2 , , Net-1 + PureNoise loss + Clean loss
Net-3 , , , Net-2 + Rec loss, SCGAN model
Table 4: Three noise modeling networks trained with different combinations of the proposed losses.

Net-1 denotes the model trained only with the adversarial (GAN) loss. Because of the lack of paired data, the training procedure is severely under-constrained. As shown In Figs. 6(a) and 6(d), the noise maps extracted from patches A and B are over-contrast. This leads to low-contrast and even distortion in the “estimated clean images”. The output of Net-1 is highly correlated to the background information of a given noisy patch. To ameliorate this problem, we added and to the objective function and trained a SCGAN model which is now denoted as Net-2. The term prevents the model from wrongly extracting textures and backgrounds as noise from clean images. The introduction of the term to the objective function ensures that the model extracts objects that can be kept consistent even after being processed by . Compared to the noise maps extracted by Net-1 and Net-2, as shown in Figs. 6(a) and 6(b), these two losses ensure that the output noise is distributed more uniformly across the whole global patch. However, the extract noises are still affected by the architecture that exists in the original noisy images. Net-3 is trained with a combination of all the three terms of losses. The loss helps the generator remove the influence that stems from the architectures in the noisy input. Comparing the noise maps in Figs. 6(b) and 6(c), it is easily seen that the noises in the extracted maps show no similarities to the architectures in the original noisy inputs.

For our proposed two-step pipeline, the noise modeling network in the first step extracts the noise map from a given noisy image and generates an estimated clean image. Natural questions arise from our method: for the blind denoising or rain streak removal tasks, it appears to be more natural to use the estimated clean image as the final de-noised output. Why did we not do so? Besides, for the noisy image SR task, it seems natural to generate the clean HR images by regarding the estimated clean images as the inputs to a well-trained EDSR.

To address the first question above, for the blind denoising and rain streak removal tasks, noise in some pixels could be missed by the noise modeling network. Hence, the noise map extracted may be imperfect, and the estimated clean image could still contain noise at these pixels. However, by adding extracted noise maps to clean images, we have a paired training set. When training a deep model for denoising or rain streak removal, in each iteration, we randomly crop paired patches from the generated training set, and the cropped patches contain noise with different intensities. This ensures that the obtained model is able to handle noisy images with various levels of noise, and yields a better de-noised image. To address the second question above, for the noisy image SR task, our experiments show that the Dn+SR method may result in noise remaining or the loss of details in the denoising step. Both these deficiencies worsen the performance of noisy SR. However, in our method, we train a model for noisy SR in an end-to-end manner. The model directly maps a noisy LR input to its clean HR version, and this ameliorates the aforementioned deficiencies.

5 Conclusion

In this paper, we proposed a new unsupervised noise modeling model, , SCGAN, to extract noise maps from images with unknown noise statistics. To facilitate the model training and improve its performance, we introduced three self-consistent losses based on intrinsic properties of a noise-modeling model and noise maps. We also provided an effective training strategy. Through extensive experiments, we demonstrate the proposed SCGAN effectively extracted noise maps from noisy images containing various noise types. We applied the proposed SCGAN to perform the blind denoising, the rain streak removal, and the noisy image SR tasks, showing its broad applicability. For all the tasks, the SCGAN achieves excellent performances that are close to the performances of models trained in a fully supervised way.


  • [1] J. Chen, J. Chen, H. Chao, and M. Yang. Image blind denoising with generative adversarial network based noise modeling. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 3155–3164, 2018.
  • [2] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence, 39(6):1256–1272, 2017.
  • [3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16, 2007.
  • [4] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley. Removing rain from single images via a deep detail network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3855–3863, 2017.
  • [5] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2862–2869, 2014.
  • [6] Y. Han, Y. Zhao, and Q. Wang. Dictionary learning based noisy image super-resolution via distance penalty weight model. PloS one, 12(7):e0182165, 2017.
  • [7] D.-A. Huang, L.-W. Kang, M.-C. Yang, C.-W. Lin, and Y.-C. F. Wang. Context-aware single image rain removal. In 2012 IEEE International Conference on Multimedia and Expo, pages 164–169. IEEE, 2012.
  • [8] L.-W. Kang, C.-W. Lin, and Y.-H. Fu. Automatic single-image-based rain streaks removal via image decomposition. IEEE Transactions on Image Processing, 21(4):1742–1755, 2012.
  • [9] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
  • [10] S. Lefkimmiatis.

    Non-local color image denoising with convolutional neural networks.

    In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pages 3587–3596, 2017.
  • [11] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown. Rain streak removal using layer priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2736–2744, 2016.
  • [12] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
  • [13] X. Liu, T. Masayuki, and O. Masatoshi. Single-image noise level estimation for blind denoising. IEEE Transactions on Image Processing, 22, 2013.
  • [14] Y. Luo, Y. Xu, and H. Ji. Removing rain from a single image via discriminative sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, pages 3397–3405, 2015.
  • [15] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 416–423. IEEE, 2001.
  • [16] D. Meng and F. De La Torre. Robust matrix factorization with unknown noise. In Proceedings of the IEEE International Conference on Computer Vision, pages 1337–1344, 2013.
  • [17] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  • [18] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  • [19] S. Roth and M. J. Black. Fields of experts: A framework for learning image priors. In null, pages 860–867. IEEE, 2005.
  • [20] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1874–1883, 2016.
  • [21] A. Singh, F. Porikli, and N. Ahuja. Super-resolving noisy images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2846–2853, 2014.
  • [22] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, et al. Ntire 2017 challenge on single image super-resolution: Methods and results. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1110–1121. IEEE, 2017.
  • [23] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1357–1366, 2017.
  • [24] Y. Yuan et al. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. methods, 30:32, 2018.
  • [25] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • [26] K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnn denoiser prior for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, 2017.
  • [27] Q. Zhao, D. Meng, Z. Xu, W. Zuo, and L. Zhang.

    Robust principal component analysis with complex noise.


    International conference on machine learning

    , pages 55–63, 2014.
  • [28] F. Zhu, G. Chen, and P.-A. Heng. From noise modeling to blind image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 420–429, 2016.