Image denoising is a classical yet still active theme in computer vision. It has become an essential and indispensable step in many image processing applications. In recent years, various algorithms have been proposed, which include nonlocal self-similarity(NSS) models , sparse representation models , and deep learning approaches   . Among them, BM3D , WNNM  and CSF  are considered as the stats-of-the-art methods in non-depth learning approaches. NSS models like BM3D provide high image quality and efficiency, and they are very effective in Gaussian denoising with known noise level. In recent, many state-of-the-art CNN algorithms like IRCNN  outperform the non-local and collaboration filtering approaches. The deep CNNs are trained to learn the image prior, which shows that CNN algorithms have a strong ability to fit the structure and pattern inside the image.
Since the deep learning methods have achieved massive success in classification  as well as other computer vision problems  . A lot of CNN algorithms have been proposed in image denoising. Aiming at image restoration task, it is important to use the prior information properly. Many prior-based approaches like WNNM involve a complex optimization problem in the inference stage, this leads to achieve high performance hardly without sacrificing computation efficiency. To overcome the limitation of prior-based methods, several discriminative learning methods have been developed to learn image prior models in the inference procedure. This kind of models can get rid of the iterative optimization procedure in the test phase. Inspired by the work of  , this task of image denoising can be seen as a Maximum A Posteriori(MAP) problem from the Bayesian perspective. The deep CNN can be used to learn the prior as a denoiser. The motivation of this work is whether we can increase the prior from the view of convolution itself. The work of  shows the design of multi-scale convolutions which can help to extract more features from the previous layer. Reference  introduced the work of dilated convolution, which can enlarge the receptive field and keep the amount of calculation. Subsequently, the work of  shows that dilated residual network can perform better than the residual network in the image classification. In the image denoising task, due to the little difference between adjacent pixels, the dilated convolution can bring more discrepancy information from the front layer to the back layer. In addition, it could increase the generalization ability of the model and require no extra computation cost. The design of residual module can accelerate the whole training process and prevent the gradient vanishing.
This paper proposes an enhanced denoiser based on the residual dilated convolutional neural network. Inspired by the residual learning insight proposed by
, we modified the dilated residual network based on residual learning. We treat image denoising as a plain discriminative learning problem. So CNN is usually utilized to separate the noisy. The reasons of using CNN can be summarized as the following. First, CNN with very deep architecture is effective in exploiting image characteristics and increasing the capacity and flexibility of the model. Second, a lot of progress have been made in CNN training methods, including parametric rectifier linear unit(PReLU), batch normalization and residual architecture. These methods can speed up the training process and improve the denoising performance. Third, there are many tools and libraries to support parallel computation for CNN on GPU, which can give a significant improvement on runtime performance. Contrary to existing various residual networks, we use multi-scale convolution to extract more information from the original image and hybrid dilated convolution module to avoid the gridding effect. In the experiment, we compare several stat-of-the-art methods, such as BM3D, IRCNN, DnCNN  and FFDnet . For gaussian denoising, our result shows that the proposed enhanced denoiser can make the processed image better with only half parameters of other CNN methods. And the proposed model has a competitive run time performance.
In summary, this paper has the following two main contributions:
Firstly, we proposed a lightweight and effective image denoiser based on multi-scale convolution group. The experiments shows that our proposed model can achieve better performance and speed over the current stat-of-the-art methods.
Secondly, we shows the proposed network can handle both gray and color image denosing robustly without the increment of parameters.
The rest of this paper is organized as follows. Section 2 gives a review of recent image denoising approaches. Section 3 formally describe our research problem and method in detail. Section 4 presents the experimental results of proposed method and the comparison with one baseline model and five different model. Finally, we conclude in section 5.
2 Related Work
Here, we provide a brief review of image denoising methods. Harmeling et al.  was firstly to apply multi-layer perception(MLP) for image denoising task, which image patches and large image databases were utilized to achieve excellent results. In , a trainable nonlinear reaction diffusion(TNRD) model was proposed and all the parameters can be simultaneously learned from training data through a loss based approach. It can be expressed as a feed-forward deep network by unfolding a fixed number of gradient inference steps. DeepAM  is consisted of two steps, proximal mapping and end continuation. It is the regularization-based approach for image restoration, which enables the CNN to operate as a prior or regularizer in the alternating minimization(AM) algorithm. IRCNN  uses the HQS framework to show that CNN denoiser can bring strong image prior into model-based optimization methods. All the above methods have shown that the decouple of the fidelity term and regularization term can enable a wide variety of existing denoising models to solve image denoising task.
Residual learning has multiple realization. The first approach is using a skipped connection from a certain layer to another layer during forward and backward propagations. This was firstly introduced by  to solve the gradient vanishing when training very deep architecture in image classification. In low-level computer vision problems, implemented a residual module within three convolution block by a skipped connection. Another residual implementation is transforming the label data into the difference between the input data and the clean data. The residual learning  can not only speed up the training, but also make the weights of network sparser.
Dilated convolution was originally developed for wavelet decomposition [IEEEexample:Holschneider1989A]
. The main idea of dilated convolution is to increase the image resolution by inserting ”holes” between pixels. The dilated convolution enables dense feature extraction in deep CNNs and enlarges the field of convolutional kernel. Chen et al. designs an atrous spatial pyramid pooling(ASPP) scheme to capture multi-scale objects and context information by using multiple dilated convolution. In image denoising, Wang et al.  proposed an approach to calculate receptive field size when dilated convolution is included.
3.1 Dilated filter
Dilated filter was introduced to enlarge receptive field. The context information can facilitate the reconstruction of the corrupted pixels in image denoising. In CNN, there are two basic ways to enlarge the receptive field to capture the context information. We can either increase the filter size or the depth. According to the existing network design  , using the filter with a large depth is a popular way. In this paper, we use the dilated filter and keep the merits of traditional convolution. In Fig. 1, the image filtered by different dilate rate shows the different receptive field. A dilated filter with dilation factor can be simply interpreted as a sparse filter of size . For kernel size and dilate rate , only 9 entries of fixed positions can be non-zeros. But the use of dilated convolutions may cause gridding artifacts . It occurs when a feature map has higher-frequency content than the sampling rate of the dilated convolution. And we notice that the hybrid dilated convolution(HDC) proposed by  addressed this issue theoretically. Suppose convolutional layers with kernel size have dilation rates of , the HDC is going to let the final size of receptive field cover a square region without any holes or missing edges. So the maximum distance between two nonzero values can be defined as
where the design goal is to let with . For example, for kernel size , pattern works as ; however, an pattern does not work as . The benefit of HDC is that it can naturally integrated with the original layers of network, without adding extra modules.
The HDC can make the better use of receptive field information. 1 dilate convolution and 5 dilate convolution will extract features information at different level. In other word, the dilated convolution can extract information at different scale.
3.2 Multiscale convolution group
Multiscale extraction for image feature is a common technique in solving computer vision problems. Multiscale extraction can make use of feature maps in different levels. In deep CNNs, the kernel and
kernel can extract different scales of features. The addition of the multiscale structure increases the width of network , on the other hand, improves the generalization of network. Inspired by the Inception module[IEEEexample:Szegedy2014Going], we proposed the multiscale convolution group. Inception module consists of the pooling layer and convolution. For image denoising task, due to the unchanged size of output image, we usually remove the pooling layer. We apply three scale filters with different numbers. The number of each of the three filter is 12, 20 , 32. The reason is that the sum of the feature map is 64 and this combination achieves the balance between feature extraction and parameters. Differing from the Inception module, we concatenate the feature maps directly. It can significantly reduce the parameters. Considering of the computation cost, we only use the multiscale module in the first layer.
Our proposed network structure illustrated in fig. 2 is inspired by  and . It consists of eleven layers with different dilated rate convolution. The first layer is the multiscale convolution group. The residual connection starts from the second layer. The skipped connection is used for training deep network because it is beneficial for alleviating the gradient vanishing problem  . Another advantage is the residual module can make the weights of network sparse, which can reduce the inference computation time. Each dilated convolution block will be followed by batch normalization  layers and parametric rectifield linear unit(PRelu) . Such network design techniques have been widely used in recent CNN architecture. In particular, it has been pointed out that this kind of combination can not only enables fast and stable training but also tends to better result in Gaussian denoising task . The PReLU and BN layer can accurate the convergence of network. Image denoising is going to recover a clean image from a noisy observation. We consider the noisy image can be expressed as , where stands for the clean image, and is the unknown Gaussian noise distribution. Owing to residual learning strategy applied, the labels are obtained by calculating the difference between the input image and the clean image. So the output of our network is the residual image, which is the prediction of the noise distribution.
It is widely acknowledged that convolution neural networks generally benefits from the giant training datasets. However, due to the limitation of computer resources, we only used 400 images of size for gray-scale denoising experiments. According to 
, using more dataset dose not improve the PSNR results of BSD68 dataset. As for the color image denoising experiment, 400 selected images from validation set of ImageNet database are the training datasets and we crop the images into small patches of size and select patches for training.
4.1 Training details
Due to the characteristic of Gaussian convolution, the output image may produce boundary artifacts. So we apply zero padding strategy and use small patches to tackle with this problem. The number of feature maps in each layer is 64. And the depth of network is set to eleven which is kind of lightweight framework. The patch size of input images is, and we use date augmented techniques to increase the diversity of the training data. The Adam solver  is applied to optimize the network parameters . The learning rate is started from and then fixed to
. The learning rate is decreasing 10 times after 60 epoches. The hyperparameters of Adam is set the default setting. The mini-batch size we use is 64, which can balance the memory usage and effects well. The network parameters are initialized using the Xavier initialization. And our denoiser models are trained with the MatConvNet package . The environment we use is under Matlab(2016b) software and an Nvidia 1080Ti GPU. And we trained 100 epoches to get the result, it almost takes half a day to train a denoised model with the specific noise level. 400 images of publicly available Berkeley segmentation(BSD500)  and Urban 100 datasets [IEEEexample:Huang2015Single] are used for training the Gaussian denoising model. In addition, we generated 3200 images by using data augmentation via image rotation, cropping and flipping. For the test data set, Set12 and BSD68 are used. Note that all those image are widely used for the evaluation of denoising models and they are not included in the training datesets. For training and validation, Gaussian noises with are added to verify the effect of our denoised model.
4.2 denoising results
We compared the proposed denoiser with several state-of-the-art denoising methods, including two model-based optimization methods BM3D and WNNM, one discriminative learning method TNRD, and three deep learning methods included IRCNN, DnCNN, FFDnet. Fig. 4 shows the visual results with details of different methods. It can be seen that both BM3D and WNNM tend to produce over-smooth textures. TRND can preserve fine details and sharp edges, but it seems that artifacts in the smooth region are generated. The three deep learning methods and the proposed method can have a pleasure result in the smooth region. It is clearly that the proposed method can preserve better texture than the other methods, such as the region above the balcony fence. Another comparison results of different methods in fig. 5 show that our method reaches a better visual result.
In order to show the capacity of the proposed model, we do the quantitative and qualitative evaluation on 2 widely used testing data sets. The average PSNR results of different results on the BSD68 and Set12 are shown in table.I and table.V. BSD68 consists of 68 gray images, which has diverse images. We can have the following observation. Firstly, the proposed method can achieve the best average PSNR result than those competing methods on BSD68 data sets. Compared to the benchmark method BM3D on BSD68, the WNNM and TNRD have a notable gain of between 0.3dB and 0.35 dB. The method IRCNN can have a PSNR gain of nearly 0.55dB. In contrast, our proposed model can outperform BM3D nearly 0.7dB on all the three noise levels. Secondly, the proposed method is better than DnCNN and FFDnet when the noise level is below 75. This result shows that the proposed method has the better trade-off between receptive field size and modeling capacity.
Table.V lists the PSNR results of different methods on the 12 test images. The best two PSNR result for each image with each level is highlighted in red and blue color. It can be seen that the proposed method can achieve the top two PSNR values on most of the test images. For the average PSNR values, the proposed method has best performance among all the methods in noisy. And it is less efficient than FFDnet in noisy. This is because FFDnet can outperforms the other methods on image ”House” and ”Barbara”, which this two images have rich amount of repetitive structures.
For color image denoising, we use the same network parameters. The only difference is the input tensor becomes. The visual comparisons are shown in fig.6 and fig.3. It is obviously that CBM3D generates false color artifacts in some region while the proposed model can recover the image with more natural color and texture structure, like more sharp edges. In addition, table.II shows that proposed model can outperform the benchmark method CBM3D among three noise level. In the meantime, the proposed method is more effective than three deep CNN methods in the color BSD68 dataset.
We give a brief calculation about the amount of parameters. Note that the values are different for gray and color image denoising due to the different network depth. For instance, DnCNN uses 17 convolution layers for gray image denosing and 20 for color image denoising, whereas FFDnet takes 15 for gray and 12 for color. In addition, FFDnet set 64 channels for gray image and 96 channels for color image. However, the proposed method can outperform the other method without the increment of the depth in color image denoising. It indicates that our model is more robust without sacrificing the computing resource.
|DnCNN||5.6 10||6.7 10|
|FFDnet||5.5 10||8.3 10|
|Proposed||3.3 10||3.4 10|
We also compare the computation time to check the applicability of the proposed method. BM3D and TNRD are utilized to be the comparison due to their potential value in practical applications. We use the Nvidia cuDNN-v6 deep learning library to accelerate the GPU computation and we do not consider the memory transfer time between CPU and GPU. Since both the proposed denoiser and TNRD support parallel computation on GPU, we also provide the GPU runtime. Table. IV lists run time comparison of different methods for denoising images of size , and . For each kind of tests, we run several times to get the average runtime. We can see that the proposed method is very competitive in both CPU and GPU computation. Such a good performance over the BM3D is properly attributed to the following reasons. First, the convolution and PRelu activation function are simple effective and efficient. Second, batch normalization is adopted, which is beneficial to Gaussian denoising. Third, residual architecture can not only accelerate the inference time of deep network, but also have a larger model capacity.
In this paper, we have designed an effective CNN denoisers for image denoising. Specifically, with the aid of skipped connections, we can easily train a deep and complex convolutional network. A lot of deep learning skills are integrated to speed up the training process and boost the denoising performance. The model-based and prior-based approaches are the popular way to tackle with denoising problem. Followed by the instruction of prior-based model, we show the possibility of increasing the features by using multiscale module and residual HDC module. Extensive experimental results have demonstrated that the proposed method can not only produce favorable image denoising performance quantitatively and qualitatively, but also have a promising run time by GPU. There are still some work for further study. First, it would be a promising direction to train a lightweight denoiser for practical applications. Second, extending the proposed method to other image restoration problems, such as image deblurring. Third, it would be interesting to investigate how to denoise the non-Gaussian noisy according to some properties of gaussian denoising models.
-  Chen L C , Papandreou G , Kokkinos I , et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 40(4):834-848.
Deng J , Dong W , Socher R , et al. ImageNet: A large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009.
-  Vedaldi A , Lenc K . MatConvNet - Convolutional Neural Networks for MATLAB[J]. 2014.
-  Kingma D , Ba J . Adam: A Method for Stochastic Optimization[J]. Computer Science, 2014.
-  Szegedy C , Liu N W , Jia N Y , et al. Going deeper with convolutions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2015.
-  Yu F , Koltun V , Funkhouser T . Dilated Residual Networks[J]. 2017.
-  Zhang K , Zuo W , Gu S , et al. Learning Deep CNN Denoiser Prior for Image Restoration[J]. 2017.
-  Wang P , Chen P , Yuan Y , et al. Understanding Convolution for Semantic Segmentation[J]. 2017.
-  Mairal J , Bach F , Ponce J , et al. Non-local sparse models for image restoration[C]// 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2010.
-  Dong W , Zhang L , Shi G . Centralized sparse representation for image restoration[J]. IEEE Transactions on Image Processing, 2011.
-  Zhang K , Zuo W , Chen Y , et al. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING.
-  Kim Y , Jung H , Min D , et al. Deeply Aggregated Alternating Minimization for Image Restoration[C]// Computer Vision and Pattern Recognition. IEEE, 2017.
-  Chen Y , Pock T . Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6):1256-1272.
-  Dabov K , Foi A , Katkovnik V , et al. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering[J]. IEEE Transactions on Image Processing, 2007, 16(8):2080-2095.
-  Gu S , Zhang L , Zuo W , et al. Weighted Nuclear Norm Minimization with Application to Image Denoising[C]// Computer Vision and Pattern Recognition. IEEE, 2014.
-  Schmidt U , Roth S . Shrinkage Fields for Effective Image Restoration[C]// Computer Vision and Pattern Recognition. IEEE, 2014.
-  Krizhevsky A , Sutskever I , Hinton G . Imagenet classification with deep convolutional neural networks[C]// NIPS. Curran Associates Inc. 2012.
-  Ronneberger O , Fischer P , Brox T . U-Net: Convolutional Networks for Biomedical Image Segmentation[J]. 2015.
-  He K , Zhang X , Ren S , et al. Deep Residual Learning for Image Recognition[J]. 2015.
-  Gatys L A , Ecker A S , Bethge M . Texture synthesis using convolutional neural networks[C]// International Conference on Neural Information Processing Systems. MIT Press, 2015.
-  Harmeling S . Image denoising: Can plain neural networks compete with BM3D?[C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012.
Zhao H , Gallo O , Frosio I , et al. Loss Functions for Image Restoration With Neural Networks[J]. IEEE Transactions on Computational Imaging, 2017, 3(1):47-57.
Sajjadi M S M , Sch?lkopf, Bernhard, Hirsch M . EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis[J]. 2016.
-  Ioffe S , Szegedy C . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[J]. 2015.
-  Mao X J , Shen C , Yang Y B . Image Denoising Using Very Deep Fully Convolutional Encoder-Decoder Networks with Symmetric Skip Connections[J]. 2016.
-  He K , Zhang X , Ren S , et al. Identity Mappings in Deep Residual Networks[J]. 2016.
-  Ulyanov D , Lebedev V , Vedaldi A , et al. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images[J]. 2016.
-  He K , Zhang X , Ren S , et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification[J]. 2015.
-  Burger H C , Schuler C , Harmeling S . Learning how to combine internal and external denoising methods[M]// Pattern Recognition. 2013.
-  He K , Zhang X , Ren S , et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification[J]. 2015.
-  Zhang K , Zuo W , Zhang L . FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising[J]. IEEE Transactions on Image Processing, 2018:1-1.
-  Wang T , Sun M , Hu K . Dilated Residual Network for Image Denoising[J]. 2017.