1 Introduction
(a) Input (14.16 dB)  (b) Our Denoised (32.64 dB) 
(c) IRCNN [45] (32.07 dB)  (d) DnCNN [44] (32.05 dB) 
Image denoising is an essential building module for various computer vision and image processing algorithms. In the past few years, the research focus in this area has been shifted to how to make the best use of image priors. To this end, several approaches attempted to exploit nonlocal self similar (NSS) patterns
[3, 10, 11], sparse models [17, 34], gradient models [33, 39, 38], Markov random field (MRF) models [35], external denoising [42, 1, 30]and convolutional neural networks
[44, 26, 45].The nonlocal selfsimilar patch matching (NLM) and module matching with collaborative filtering (BM3D) have been two prominent baselines for image denoising for almost a decade now. Due to popularity of NLM [3] and BM3D [10], a number of their variants [15, 11, 25, 16] were also proposed to execute the search for similar patches in similar transform domains.
Complementing above, use of external priors for denoising has been motivated by the pioneering studies [27, 6], which showed that selecting correct reference patches from a large external image dataset of clean samples can theoretically suppress additive noise and attain infinitesimal reconstruction error. However, directly incorporating patches from an external database is computationally prohibitive even for a single image. To overcome this problem, Chan et al. [5] proposed efficient sampling techniques for large databases but still the denoising is impractical as it takes hours to search patches for one single image if not days. An alternative to these methods can be considered as the dictionary learning based approaches [13, 31, 12] which learn an overcomplete dictionary from a set of external natural clean images and then enforce patch selfsimilarity through sparsity. Similarly, [46] imposed a group residual representation between the sparse representation of the noise degraded image and that of its prefiltered version to minimize the error.
Towards an efficient fusion of external datasets, many recent works [47, 14, 40]
investigated the use of maximum likelihood frameworks to learn Gaussian Mixture Model (GMM) of natural image patches or group patches for clean patch estimation. Several studies, including
[41, 7], modified Zoran et al. [47]’s statistical prior for reconstruction of classspecific noisy images by capturing the statistics of noisefree patches from a large database of same category images through the ExpectationMaximization algorithm. Other similar methods on external denoising include TID
[30], CSID [1] and CID [43]; however, all of these have limited applicability in denoising of generic (from an unspecific class) images.The advent of convolutional neural networks (CNN) provides a significant performance boost for image denoisng methods [44, 45, 26, 4, 36] have also been proposed very recently. CSF [36] learns a single framework based on unification of randomfield based model and halfquadratic optimization. Similarly, TNRD [8] adapts fieldofexperts [35]
prior into CNN framework by incorporating a preset number of inference steps. Undoubtedly, CSF and TNRD have shown improved results over more classical methods; however, the imposed image priors inherently impede their performances, which highly rely on the choice of hyperparameter settings, extensive finetuning and stagewise training.
To overcome the drawbacks of CSF and TNRD, IRCNN [45] and DnCNN [44]
learn the residual present in the contaminated image by using the noise in the loss function instead of the clean image as the groundtruth. Although both models were able to report favorable results, their performance depends heavily on the accuracy of noise estimation without knowing the underlying structures and textures present in the image. Besides, they are computationally expensive because of the batch normalization operations after every convolutional layer. Another notable work in denoising is NLNET
[26] which exploits the nonlocal selfsimilarity using deep networks. This model improves on classical methods but lagging behind IRCNN and DnCNN, as it inherits the limitations associated with the NSS priors as not all patches recur in an image.Inspiration & Motivation:
Current convolutional neural network based image denoising methods [4, 44, 45, 26] connect weight layers consecutively and learn the mapping by brute force without putting any effort into the architecture. One problem with such an architecture is the addition of more weight layers to increase the depth of the network. Even if the new weight layers are added to the mentioned CNN based denoising methods, it will fall into gradients vanishing problem and impel it further [2]. This property of increasing the size of the network is important and helps in performance boost [28, 20]. Therefore, our goal is to propose a model that overcomes this deficiency.
Another reason is the lack of true color denoising. Most of the current denoising systems are either for grayscale image denoising or treat each color channel separately ignoring the relationship between the color channels. Only a handful of works [9, 1, 44, 26] approached color image denoising in its own context.
To provide a solution, our choice is the convolutional neural networks in a discriminative prior setting for image denoising. There are many advantages of using CNNs, including efficient inference, incorporation of robust priors, integration of local and global receptive fields, regressing on nonlinear models, and discriminative learning capability. Furthermore, we propose a modular network where we call each module as a mapping modules (MM). The mapping modules can be replicated and easily extended to any arbitrary depth for performance enhancement.
Contributions:
The contributions of this work can be summarized as follows:

An effective CNN architecture that consists of a Chain of Identity Mapping modules (CIMM) for image denoising. These modules share a common composition of layers, with residual connections between them to facilitate training stability.

The use of dilated convolutions for learning suitable filters to denoise at different levels of spatial extent.

A single denoising network that can handle various noise levels.
2 Chain of Identity Mapping Modules
This section presents our approach to image denoising by learning a Convolutional Neural Network consisting of a Chain of Identity Mapping Modules (CIMM). Each module is composed of a series of preactivation units followed by convolution functions, with residual connections between them. Section 2.2 formulates the learning objective. Subsequently, the metastructure of the CIMM network in Section 2.1.
2.1 Network Design
Residual learning has recently delivered state of the art results for object classification [18, 21] and detection [29], while offers training stability. Inspired by the Residual Network variant with identity mapping [21], we adopt a modular design for our denoising network. The design consists of a Chain of Identity Mapping modules (CIMM).
Network elements:
Figure 2
depicts the entire architecture, where identity mapping modules are shown as a blue blocks, which are in turn composed of basic ReLU (orange) and convolution (green) layers. The output of each module is a summation of the identity function and the residual function. In our experiments, we typically employ
filters of size in each convolution layer.The metalevel structure of the network is governed by three parameters: the number of identity modules (i.e. ), the number of preactivationconvolution pairs in each module (i.e. ), and the number of output channels (i.e. ), which we fixed across all the convolution layers.
The highlevel structure of the network can be viewed as a chain of identity mapping modules, where the output of each module is fed directly into the subsequent one. Subsequently, the output of this chain is fed to a final convolution layer to produce a tensor with the same number of channels as the input image. At this point, the final convolution layer directly predicts the noise component from a noisy image. The noisefree image/patch is then subtracted from the input to recover the noisefree image .
The identity mapping modules are the building blocks of the network, which share the following structure. Each module consists of two branches: a residual branch and an identity mapping branch. The residual branch of each module contains a series of layers pairs, i.e. a nonlinear preactivation (typically ReLU) layer, followed by a convolution layer. Its main responsibility is to learn a set of convolution filters to predict image noise. In addition, the identity mapping branch in each module allows the propagation of the loss gradients in both directions without any bottleneck.
Justification of design
For image denoising, several previous works have adopted a fully convolutional network design, without any pooling mechanism [44, 23, 45]. This is necessary in order to preserve the spatial resolution of the input tensor across different layers. We follow this design by using only nonlinear activations and convolution layers across our network.
Furthermore, as inspired by nonlocal denoising method, we design the convolution layers in such a way that neurons in the last layer of each identity mapping (IM) module observe the full spatial receptive field in the first convolution layer. This design helps learning to connect input neurons at all spatial locations to the output neurons, in much the same way as wellknown nonlocal mean methods such as [10, 3]
. Instead of using a unit stride within each layer, we also experimented with dilated convolutions to increase the receptive fields of the convolution layers. By this design, we can reduce the depth of each IM module while the final layer’s neurons can still observe the full input spatial extent.
Preactivation has been shown to offer the highest performance for classification when used together with identity mapping [21]. In a similar fashion, our design employs ReLU before each convolution layer. This design differs from existing neural network architectures for denosing [23, 44, 26]. The preactivation helps training to converge more easily, by while the identity function preserves the range of gradient magnitudes. Also, the resulting network generalizes better as compared to the postactivation alternative. This property enhances the denoising ability of our network.
Formulation
Now we formulate the prediction output of this network structure for a given input patch x. Let denote the set of all the network parameters, which consists of the weights and biases of all constituting convolution layers. Specifically, we let denote both the kernel and bias parameters of the th convolution layer in the th residual branch. Within such a branch, the intermediate output of the th ReLUconvolution pair is a composition of two functions
(1) 
where and are the notation for the convolution and the ReLU functions, is the output of the i.e. th ReLUconvolution pair. By convention, we let .
By composing the series of ReLUconvolution pairs, we obtain the output of the th residual branch as
(2) 
Chaining all the identity mapping modules, we obtain the intermediate output as . Finally, the output of this chain is convolved with a final convolution layer with learnable parameters to predict the noise component as .
2.2 Learning to Denoise
Our convolutional neural network (CNN) is trained on image patches or regions rather than at the imagelevel. This decision is driven by a number of reasons. Firstly, it offers random sampling of a large number of training samples at different locations from various images. Random shuffling of training samples is wellknown to be a useful technique to stabilize the training of deep neural networks. Therefore, it is preferable to batch training patches with a random, diverse mixture of local structures, patterns, shapes and colors. Secondly, there has been success in approaches that learns image patch priors from external data for image denoising [47].
From a set of noisefree training images, we randomly crop a number of training patches as the groundtruth. The noisy version of these patches is obtained by adding (Gaussian) noise to the ground truth training images. Let us denote the set of noisy patches corresponding to the former as . With this setup, our image denoising network (described in Section 2.1) is aimed to reconstruct a patch from the input patch .
The learning objective is to minimize the following sum of squares of norms
(3) 
3 Experiments
3.1 Datasets and Baselines
We performed experimental validation on the widely used classical images and BSD68 datasets. To generate noisy test images, we corrupt the images by additive white Gaussian noise with standard deviations (std) of
, as employed by [45, 44, 26]. For evaluation purposes, we use the Peak SignaltoNoise Ratio (PSNR) index as the error metric. We compare our proposed method with numerous stateoftheart methods, including BM3D [10], WNNM [17], MLP [4], EPLL [47], TNRD [8], IRCNN [45], DnCNN [44] and NLNET [26]. To ensure a fair comparison, we use the default setting provided by the authors.3.2 Training Details
The training input to our network is noisy and noisefree patch pairs of size cropped randomly from the BSD400 dataset. Note that there is no overlap between the training, i.e. BSD400 and evaluation, i.e. BSD68 dataset. We also augment the training data with horizontally and vertically flipped versions of the original patches and those rotated at an angle of , where . The number of patches are randomly cropped on the fly from the 400 images during training.
We offer two strategies for handling different noise levels. The first one is to train a network for each specific noise level. Alternatively, we train a single blind model for the noise range (similar to [44]) and we refer to this model as OursBlind. At each update of training, we construct a batch by randomly selecting noisy patches with noise levels between and .
We implement the denoising method in the Caffe framework on Tesla P100 GPUs, and employ the Adam optimization algorithm
[24] for training. The initial learning rate was set to and the momentum parameter was . We scheduled the learning rate such that it is halved at every minibatches of size . We train our network from scratch by a random initialization of the convolution weights according to the method in [19] and a regularization strength, i.e. weight decay, of .3.3 Boosting Denoising Performance
To boost the performance of the trained model, we use the a late fusion strategy as adopted by [37]. During the evaluation, we perform eight types of augmentation (including identity) of the input noisy images as . From these geometrically transformed images, we estimate corresponding denoised images using our model. To generate the final denoised image , we perform the corresponding inverse geometric transform and then take the average of the outputs as . Self ensemble is beneficial as it saves training time and have small number of parameters as compared to individually trained eight models. We also found empirically that self ensemble gives approximately the same performance as the models trained individually with geometric transform.
3.4 Ablation Studies



Original  Noisy  BM3D  WNNM  MLP  
14.16dB  25.82dB  26.32dB  26.26dB  


EPLL  TNRD  DnCNNS  irCNN  Ours  
Monarch image  25.94dB  26.42dB  26.78dB  26.61dB  27.21dB 

Original  Noisy  BM3D  WNNM  MLP  
14.16dB  26.21dB  26.51dB  26.54dB  



EPLL  TNRD  DnCNNS  irCNN  Ours  
Castle from BSD68 [32]  26.35dB  26.60dB  26.90dB  26.88dB  27.20dB 
CBM3D  DnCNN  CBM3D  DnCNN  
29.65dB  30.52dB  31.68dB  32.33dB  



irCNN  Ours  irCNN  Ours  
Fish from BSD68 [32]  30.40dB  31.23dB  Vase from BSD68 [32]  32.21dB  32.76dB 
Original  Input (20.18 dB)  CBM3D (38.62 dB)  DnCNN (39.90 dB)  irCNN (39.53 dB)  Ours (40.13 dB) 


Original  Input (20.18 dB)  CBM3D (29.37 dB)  DnCNN (30.89 dB)  irCNN (30.60 dB)  Ours (31.04 dB) 

.
Noisy real image  Denoised by our network  Real noisy image  Denoised by our network 
Training patch size  
20  30  40  50  60  70 
29.13  29.30  29.34  29.36  29.37  29.38 
3.4.1 Influence of the patch size
In our network, patch size plays an important role and here we show the influence of patch size on our network. Table 1 shows the average PSNR on BSD68 [22] for with respect to the increase in size of the training patch. It is obvious that performance improves as the patch size increases. The main reason for this phenomenon is the size of the receptive field, with a larger patch size network learns more contextual information, hence able to predict local details better.
Number of modules  
2  4  6  8 
29.28  29.34  29.35  29.36 
No of layers  18  9  6 

Kernel dilation  1  2  3 
29.34  29.34  29.34 

Cman  House  Peppers  Starfish  Monar  Airpl  Parrot  Lena  Barbara  Boat  Man  Couple  Average 

BM3D [10]  31.91  34.93  32.69  31.14  31.85  31.07  31.37  34.26  33.10  32.13  31.92  32.10  32.372 
WNNM [17]  32.17  35.13  32.99  31.82  32.71  31.39  31.62  34.27  33.60  32.27  32.11  32.17  32.696 
EPLL [47]  31.85  34.17  32.64  31.13  32.10  31.19  31.42  33.92  31.38  31.93  32.00  31.93  32.138 
CSF [36]  31.95  34.39  32.85  31.55  32.33  31.33  31.37  34.06  31.92  32.01  32.08  31.98  32.318 
TNRD [8]  32.19  34.53  33.04  31.75  32.56  31.46  31.63  34.24  32.13  32.14  32.23  32.11  32.502 
DnCNNS [44]  32.61  34.97  33.30  32.20  33.09  31.70  31.83  34.62  32.64  32.42  32.46  32.47  32.859 
DnCNNB [44]  32.10  34.93  33.15  32.02  32.94  31.56  31.63  34.56  32.09  32.35  32.41  32.41  32.680 
IrCNN [45]  32.55  34.89  33.31  32.02  32.82  31.70  31.84  34.53  32.43  32.34  32.40  32.40  32.769 
Oursblind 
32.11  35.10  33.28  32.31  33.07  31.58  31.80  34.67  32.48  32.42  32.40  32.50  32.812 
Ours  32.61  35.21  33.21  32.35  33.33  31.77  32.01  34.69  32.74  32.44  32.50  32.52  32.950 


BM3D [10] 
29.45  32.85  30.16  28.56  29.25  28.42  28.93  32.07  30.71  29.90  29.61  29.71  29.969 
WNNM [17]  29.64  33.22  30.42  29.03  29.84  28.69  29.15  32.24  31.24  30.03  29.76  29.82  30.257 
EPLL [47]  29.26  32.17  30.17  28.51  29.39  28.61  28.95  31.73  28.61  29.74  29.66  29.53  29.692 
MLP [4]  29.61  32.56  30.30  28.82  29.61  28.82  29.25  32.25  29.54  29.97  29.88  29.73  30.027 
CSF [36]  29.48  32.39  30.32  28.80  29.62  28.72  28.90  31.79  29.03  29.76  29.71  29.53  29.837 
TNRD [8]  29.72  32.53  30.57  29.02  29.85  28.88  29.18  32.00  29.41  29.91  29.87  29.71  30.055 
DnCNNS [44]  30.18  33.06  30.87  29.41  30.28  29.13  29.43  32.44  30.00  30.21  30.10  30.12  30.436 
DnCNNB [44]  29.94  33.05  30.84  29.34  30.25  29.09  29.35  32.42  29.69  30.20  30.09  30.10  30.362 
IrCNN [45]  30.08  33.06  30.88  29.27  30.09  29.12  29.47  32.43  29.92  30.17  30.04  30.08  30.384 
Oursblind 
29.87  33.34  30.94  29.68  30.39  29.08  29.38  32.65  30.17  30.27  30.08  30.20  30.505 
Ours 
30.26  33.44  30.87  29.77  30.62  29.23  29.61  32.66  30.29  30.30  30.18  30.24  30.624 


BM3D [10] 
26.13  29.69  26.68  25.04  25.82  25.10  25.90  29.05  27.22  26.78  26.81  26.46  26.722 
WNNM [17]  26.45  30.33  26.95  25.44  26.32  25.42  26.14  29.25  27.79  26.97  26.94  26.64  27.052 
EPLL [47]  26.10  29.12  26.80  25.12  25.94  25.31  25.95  28.68  24.83  26.74  26.79  26.30  26.471 
MLP [4]  26.37  29.64  26.68  25.43  26.26  25.56  26.12  29.32  25.24  27.03  27.06  26.67  26.783 
TNRD [8]  26.62  29.48  27.10  25.42  26.31  25.59  26.16  28.93  25.70  26.94  26.98  26.50  26.812 
DnCNNS [44]  27.03  30.00  27.32  25.70  26.78  25.87  26.48  29.39  26.22  27.20  27.24  26.90  27.178 
DnCNNB [44]  27.03  30.02  27.39  25.72  26.83  25.89  26.48  29.38  26.38  27.23  27.23  26.91  27.206 
IrCNN [45]  26.88  29.96  27.33  25.57  26.61  25.89  26.55  29.40  26.24  27.17  27.17  26.88  27.136 
Oursblind 
27.03  30.48  27.57  26.01  27.03  25.84  26.53  29.77  26.89  27.28  27.29  27.06  27.398 
Ours 
27.25  30.70  27.54  26.05  27.21  26.06  26.53  29.65  26.62  27.36  27.26  27.24  27.457 


BM3D [10]  24.62  27.91  25.07  23.56  24.24  23.75  24.49  27.57  25.47  25.40  25.56  25.00  25.221 
WNNM [17]  24.86  28.59  25.25  23.78  24.62  24.00  24.64  27.85  26.17  25.58  25.68  25.18  25.517 
EPLL [47]  24.60  27.32  25.03  23.52  24.19  23.72  24.44  27.11  23.20  25.27  25.50  24.80  24.891 
DnCNNS [44]  25.37  28.22  25.50  23.97  25.10  24.34  24.98  27.85  23.97  25.76  25.91  25.31  25.523 
Ours  25.83  29.19  25.90  24.28  25.66  24.59  25.12  28.25  25.06  26.00  26.02  25.78  25.974 
3.5 Comparisons
3.5.1 Number of modules
We show the effect of the number of modules on denoising results. As mentioned earlier, each module M consists of five convolution layers, by increasing the number of modules, we are making our network deeper. In this settings, all parameters are constant, except the number of modules as shown in Table 2. It is clear from the results that making the network deeper increase the average PSNR. However, since fast restoration is desired, we prefer a small network of five modules i.e. , which achieves better performance than other methods.
3.5.2 Kernel dilation and number of layers
It has been shown that the performance of some networks can be improved either by increasing the depth of the network or by using large convolution filter size to capture the context information [45, 44]. This helps the restoration of noisy structures in the image. The usage of traditional filters is popular in deeper networks. However, using dilated filters there is a tradeoff between the number of layers and the size of the dilated filters. In Table 3, we present the relation between the dilated filter size and the number of layers using three experimental setting. In the first experiment as shown in the first column of Table 3, we use a traditional filter of size and depth of 18 to cover the receptive field of training patch of size . In the next experiment, we keep the size of the filter same but enlarge the filter using a dilation factor of two. This increases the size of the filter to but having nine nonzero entries it can be interpreted as a sparse filter. Therefore, the receptive field of the training patch can now be covered by nine nonlinear mapping layers, contrary to the 18layers depth per module. Similarly, by expanding the filter by a dilation of three would result in the depth of each module to be six. As in Table 3, all three trained models result in similar denoising performance, with the obvious advantage of the shallow network being the most efficient.
Noise  Methods  

Levels  BM3D [10]  WNNM [17]  EPLL [47]  TNRD [8]  DnCNNS [44]  IrCNN [45]  NLNet [26]  Oursblind  Ours 
15  31.08  31.32  31.19  31.42  31.73  31.63  31.52  31.68  31.81 
25  28.57  28.83  28.68  28.92  29.23  29.15  29.03  29.18  29.34 
50  25.62  25.83  25.67  26.01  26.23  26.19  26.07  26.31  26.40 
70  24.44    24.43    24.90        25.13 
Noise  Methods  

Levels  CBM3D [9]  MLP [4]  TNRD [8]  DnCNN [44]  IrCNN [45]  CNLNet [26]  Oursblind  Ours 
15  33.50    31.37  33.89  33.86  33.69  33.96  34.12 
25  30.69  28.92  28.88  31.33  31.16  30.96  31.32  31.42 
50  27.37  26.00  25.94  27.97  27.86  27.64  28.05  28.19 
In this section, first we demonstrate how our method performs on classical images and then report results on the BSD68 dataset.
3.5.3 Classical Images
For completeness, we compare our algorithm to several stateoftheart denoising methods using grayscale classical images shown in Figure 3 and reported in Table 4.
In Table 4, we present the average PSNR scores for the denoised images. Our network is the best performer for almost all classical images except ’Barbara’. The reason for this may be the repetitive structures in the mentioned image, which makes it easy for BM3D [10] and WNNM [17] to find and employ patches with great similarity to the noisy input, hence providing better results.
Subsequently, we depict an example from the classical images. The visual quality of our recovered image, as shown in Figure 3, is better than all others. This also illustrates that our network restores aesthetically pleasing textures. Small and noticeable features restored by our network include the sharpness and the clarity of the subtle textures around the fore and hind wings, mouth, and antennas of the butterfly. Furthermore, a magnified view of the results in Figures 3 for methods [10, 45, 26] shows artifacts and failures in the smooth areas. Our CNN network also outperforms [45, 26, 44], which are trained using deep neural networks.
3.5.4 BSD68 Dataset
We present the average PSNR scores for the estimated denoised images in Table 6. The IRCNN [45] and DnCNN [44] network structures are similar, hence produce nearly similar results. On the other hand, our method reconstructs the images accurately, achieving higher PSNR then completing methods on all four levels of noise. Furthermore, the difference in PSNR increases between our method and thestateoftheart as the noise level increases.
For a comprehensive evaluation, we demonstrate the visual results on a selected grayscale image from BSD68 [32] dataset in Figure 4
. In our results, the image details are more similar to the groundtruth details, and our quantitative results are numerically higher than the others. Our method leads the second best method by several orders of magnitude (PSNR is computed in the logarithmic scale). Also, note that the denoising results of other CNN based algorithms are comparable to each other. This indicates that the prevalent use of deep learning networks by other denoising methods does not provide the best performance.
3.6 Color Image Denoising
For noisy color images, we train our network with the noisy RGB input patches of size 4040 with the corresponding clean groundtruth patches. We only modify the last convolution layer of the grayscale network to output three channels instead of one channel, keeping all other parameters same as the grayscale network. This is very convenient for hardware implementations in real applications.
We present the quantitative results in Table 6 and qualitative results in Figures 57 against benchmark methods including the latest CNN based stateoftheart color image denoising techniques. It can be observed that our algorithm attains an improved average PSNR on all three different noise levels for the CBSD68 dataset [32]. As shown, our method restores true colors closer to their authentic values while others fail and induce false colorizations in certain image regions. Furthermore, a close look reveals that our network reproduces the local texture with much less artifacts and sufficiently sharp details.
3.7 Realworld noisy images
As a last experiment, we demonstrate the performance of our network on realworld noisy images. Figure 7 shows such examples denoised by our blind (requiring no noise prior) denoising models. As visible, the details are preserved properly and the noise is removed effectively. When the noise is AWGN or adequately satisfies the criteria for additive Gaussianlike noise criteria, our model works accurately. This experiment indicates that our network is wellsuited for realworld applications.
4 Conclusions
To sum up, we employ residual learning and identity mapping to predict the denoised image using a fivemodule and fivelayer deep network of 26 weight layers with dilated convolutional filters without batch normalization. Our choice of network is based on the ablation studies performed in the experimental section of this paper.
This is the first modular framework to predict the denoised output without any dependency on the pre or postprocessing. Our proposed network removes the potentially authentic image structures while allowing the noisy observations to go through its layers, and learns the noise patterns to estimate the clean image. In future, our aim is to generalize our denoising network to other image restoration tasks.
References
 [1] S. Anwar, F. Porikli, and C. P. Huynh. Categoryspecific object image denoising. IEEE Transactions on Image Processing, 26(11):5506–5518, 2017.
 [2] Y. Bengio, P. Simard, and P. Frasconi. Learning longterm dependencies with gradient descent is difficult. IEEE transactions on neural networks, 1994.
 [3] A. Buades, B. Coll, and J.M. Morel. A nonlocal algorithm for image denoising. In CVPR, pages 60–65, 2005.
 [4] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In CVPR, pages 2392–2399, 2012.
 [5] S. H. Chan, T. Zickler, and Y. M. Lu. Monte carlo nonlocal means: Random sampling for largescale image filtering. TIP, pages 3711–3725.
 [6] P. Chatterjee and P. Milanfar. Is denoising dead? Image Processing, IEEE Transactions on, pages 895–911, 2010.
 [7] F. Chen, L. Zhang, and H. Yu. External patch prior guided internal clustering for image denoising. 2015.
 [8] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence, 39(6):1256–1272, 2017.
 [9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminancechrominance space. In ICIP, 2007.
 [10] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3D transformdomain collaborative filtering. Image Processing, IEEE Transactions on, pages 2080–2095, 2007.

[11]
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian.
BM3D image denoising with shapeadaptive principal component analysis.
In Signal Processing with Adaptive Sparse Structured Representations, 2009.  [12] W. Dong, X. Li, D. Zhang, and G. Shi. Sparsitybased image denoising via dictionary learning and structural clustering. In CVPR, pages 457–464, June 2011.

[13]
M. Elad and D. Datsenko.
Examplebased regularization deployed to superresolution reconstruction of a single image.
Comput. J., pages 15–30, 2009.  [14] L. Z. F. Chen and H. Yu. External Patch Prior Guided Internal Clustering for Image Denoising. In ICCV, pages 1211–1218, 2015.
 [15] A. Foi, V. Katkovnik, and K. Egiazarian. Pointwise shapeadaptive DCT for highquality denoising and deblocking of grayscale and color images. IEEE transactions on image processing, pages 1395–1411, 2007.
 [16] B. Goossens, H. Luong, A. Pizurica, and W. Philips. An improved nonlocal denoising algorithm. In Local and NonLocal Approximation in Image Processing, International Workshop, Proceedings, page 143, 2008.
 [17] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In CVPR, pages 2862–2869, 2014.
 [18] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
 [19] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. CoRR, abs/1502.01852, 2015.

[20]
K. He, X. Zhang, S. Ren, and J. Sun.
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 770–778, 2016.  [21] K. He, X. Zhang, S. Ren, and J. Sun. Identity Mappings in Deep Residual Networks, pages 630–645. 2016.
 [22] H. Hirschmüller and D. Scharstein. Evaluation of cost functions for stereo matching. In CVPR, pages 1–8, 2007.
 [23] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image superresolution using very deep convolutional networks. In CVPR, 2016.
 [24] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
 [25] M. Lebrun, A. Buades, and J.M. Morel. A nonlocal bayesian image denoising algorithm. SIAM Journal on Imaging Sciences, pages 1665–1688, 2013.
 [26] S. Lefkimmiatis. Nonlocal color image denoising with convolutional neural networks. CVPR, 2016.
 [27] A. Levin and B. Nadler. Natural image denoising: Optimality and inherent bounds. In CVPR, pages 2833–2840, 2011.
 [28] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image superresolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
 [29] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie. Feature pyramid networks for object detection. CoRR, abs/1612.03144, 2016.
 [30] E. Luo, S. H. Chan, and T. Q. Nguyen. Adaptive image denoising by targeted databases. Image Processing, IEEE Transactions on, pages 2167–2181, 2015.
 [31] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Nonlocal sparse models for image restoration. In ICCV, pages 2272–2279, 2009.
 [32] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, pages 416–423, 2001.
 [33] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variationbased image restoration. Multiscale Modeling & Simulation, pages 460–489, 2005.
 [34] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma. Rasl: Robust alignment by sparse and lowrank decomposition for linearly correlated images. TPAMI, pages 2233–2246, 2012.
 [35] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
 [36] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In CVPR, pages 2774–2781, 2014.
 [37] R. Timofte, R. Rothe, and L. Van Gool. Seven ways to improve examplebased single image super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1865–1873, 2016.
 [38] Y. Weiss and W. T. Freeman. What makes a good model of natural images? In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–8. IEEE, 2007.
 [39] J. Xu and S. Osher. Iterative regularization and nonlinear inverse scale space applied to waveletbased denoising. Image Processing, IEEE Transactions on, pages 534–544, 2007.
 [40] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch Group Based Nonlocal SelfSimilarity Prior Learning for Image Denoising. In ICCV, pages 1211–1218, 2015.
 [41] L. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based nonlocal selfsimilarity prior learning for image denoising. 2015.
 [42] H. Yue, X. Sun, J. Yang, and F. Wu. Cid: Combined image denoising in spatial and frequency domains using web images. In CVPR, pages 2933–2940, June 2014.
 [43] H. Yue, X. Sun, J. Yang, and F. Wu. Image denoising by exploring external and internal correlations. TIP, pages 1967–1982, 2015.
 [44] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 2017.
 [45] K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnn denoiser prior for image restoration. CVPR, 2017.
 [46] Q. W. Y. B. Zhiyuan Zha, Xinggan Zhang and L. Tang. Group sparsity residual constraint for image denoising. In arXiv preprint arXiv:1703.00297, 2017.
 [47] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In ICCV, pages 479–486, 2011.
Comments
There are no comments yet.