I Introduction
Images are always contaminated by noise during acquisition and transmission. Usually, the distributions of random noise are assumed to be some standard probabilistic distributions, such as Gaussian, Poisson, Gamma distributions and so on. The additive noise model can be easily written as
where are observed image, true image and noise, respectively.
Many methods have been proposed [1][2] to obtain the clean image from the observed data. Model based methods are traditional and popular techniques. In which filter method is a very classical technique, some representative methods such as Gaussian filters[3], Gabor filters[4] and median type filters[5][6] are still very popular since their simple implementations. Wavelet based approaches[7][8][9] suppress the high frequency coefficients by thresholding and statistical approaches[10][11]
treat noise as some realizations of a random variable. They are two powerful methods for image denoising. Variational method
[12][13] is another useful and efficient tool. This approach is to minimize a cost functional which contains a data fidelity and regularization termswhere is the data fidelity to measure the discrepancy between the true and the observed data. It can be derived from the maximum likelihood estimation of noise. serves as regularization formulating image prior. Variational methods draw extensive researches since these methods can naturally equip model with regularization and flexibly integrate the advantages of different methods. Meanwhile, the Total Variation(TV)[13] regularization has been proven its success on denoising and inverse problems [14][15][16]. However, TV can not preserve the repeated tinny image details such as textures. To better capture these image details, nonlocal methods [17][18][19] were proposed. These methods take full use of the selfsimilarity properties existing in an image, which can be integrated in a variational methods naturally. The nonlocal methods always have better performance than local methods on denoising. However, the weighting function existing in the nonlocal model are usually difficult to be determinated. There are many improved nonlocal methods based on the selfsimilarity properties among image patches, such as BM3D[20], learned simultaneous sparse coding(LSSC)[21] and weighted nuclear norm minimization(WNNM)[22].
Learning based methods[23][24] draw much attention recently for its outstanding denoising performance. Mathematically, the learning based methods can be expressed as
where is a nonlinear operator functioned by recursion with parameters set , is the input data and represents the output data. Given some data pair , the model can be trained precisely to fit the given samples. Obviously, the learning based model can be used in many fields, so long as the sample data are prepared enough and properly, such as denoising[25], image classification[26] and other interesting application[27][28]
. The learning especially neural network based denoising methods have been proposed in many works, and most of these works establish different kinds of networks as denoisers, such as commonly used convolutional neural network
[29][25], multilayer perceptron
[30] and stacked sparse denoising autoencoders method[31].Most of the works assume the noise was single white Gaussian noise, which can be removed by a based fidelity[13] term. With other noise assumptions, the data fidelity can be different, such as based fidelity for impulse noise[32] [14] (including pepperandsalt noise and random valued noise) and pointwise based fidelity for Possion noise[33]. However, the noise model is more complicated in practical application since the changeable imaging environments. More precise and reasonable, the noise should be modeled by mixture distributions such as GaussianGaussian, Gaussianimpulse and so on. Unfortunately, the single type noise removal models are not suitable for the mixed noise models any more[34], since there is no unified data fidelity can be used in mixed noise removal model, which makes mixed noise removal troublesome.
The key point of mixed noise removal is to determine precisely the type of noise in each pixel. LopezRubio[35] gives a kernel estimation method to remove the Gaussianimpulse noise, which is based on Bayesian classification of each pixel. Xiao et al.[36] establish a model for Gaussianimpulse noise removal. Liu. et al.[34] propose an adaptive mixed noise removal models based on EM process, which demonstrates good performance.
Though learning based methods show better denoising performance under single type noise such as Gaussian noise, these kinds of methods need large amount of labeled samples, which limits the application and development of these methods on mixed noise removal. As for variational based methods, most of the variational based methods needs only one image, and these kinds of methods can integrate the prior (regularization) of image flexibly. However, the prior existing in a variational model is always based on low level image information and single image, which is always unsatisfied.
In this paper, we extend our previous EM based mixed noise removal method [37], and integrate a CNN process as regularization to propose a new variational method. In our method, the variational process can estimate the noise parameters iteratively and it can be used to classify noise types and levels in each pixel. By splitting methods, we can separate our algorithm into four steps: regularization, synthesis, parameter estimation and noise classification, in which each step is related a minimization problem and can be optimized efficiently. Meanwhile, we employ the deep learning method (CNN) to learn the natural images priori in our algorithm, which strengthens the regularization priori.
The CNN based regularization can better catch the image prior, image synthesis will correct the oversmooth effect by CNN process, and noise estimation can give CNN denoiser a good noise estimation to behave well, all these step work together and one can get some satisfactory restored results. To the best of knowledge, this is the first attempt integrating the variational mixed noise removal methods and learning based methods together.
The rest of this paper are organized as follows: we review the related work in section II, the proposed model and details of algorithm are presented in section III, numerical experiments of the proposed model are given in section IV. We give the conclusion and further research in section V.
Ii Related Work
In this paper, we consider the additive mixed noise removal. To be different from most of denoising works, here the mixed noise is assumed to be
(1) 
where is the
th noise component with probability density function (PDF)
, is a unknown mixture ratio and satisfiesOnce the PDF is given, then by assuming the noise is independent identically distributed and one can get its negative loglikelihood functional as follows:
Here
is a statistical parameter set contains noise parameters such as mixture ratios, means and variances.
Usually, this likelihood functional can be chosen as the fidelity term in variational method. However, to be different from the single Gaussian noise case, here is not quadratic and it is not easy to be efficiently optimized. One alternative way is to minimize a simple upper bound functional of . In [37], such a upper bound functional named had been found as
(2) 
Here is the parameters set, and
is a vectorvalued function with its
th component function as . Moreover, must satisfy a segmentation condition thatIn [37], it has been shown that the three variables functional is a upper bound of , i.e.
Lemma 2 (Commutativity of logsum, [37])
It seems that is more complicated than since there is an extra variable in . However, to minimize is easier since the problem would become quadratic with respect to , and always would have a closedform solution in some cases. Moreover, the introduced is a probability which indicates the noise at each pixel comes from which mixture component, and thus the noise would be classified by according to different statistical parameters.
To optimize , the alternating minimization could be employed, and one can get the following iteration scheme:
(3) 
For such a scheme, it has been shown that
According to this lemma, to optimize can be replaced by by adding a variable and logsum interchange. Based on this fact, the authors in [37] proposed a variational model with dictionary learning to denoise a variety of mixed noise such as GaussianGaussian, impulse and GaussianImpulse mixtures. However, such a dictionary learning is images driven and thus the learning and denoisng procedures are synchronized. In addition, it is hardly to split them into two separated tasks, and thus the algorithm would be very timeconsuming. One more thing, the dictionary learning is linear and low level learning method and it could not find some nonlinear and deep image priori in natural images. In this paper, we will integrate the deep learning method to improve it.
Iii The Proposed Method
In this section, we will built a general variational model with CNN regularization for mixed noise removal.
Iiia General Model
The general mixed noise removal model could be
(4) 
where is defined in (2) and is a learningbased regularization term, is a balance parameter which control the smoothness of the restorations.
By applying the wellknown alternating minimization scheme, we can get
(5a)  
(5b)  
(5c) 
In order to use CNN, we must split the optimization problem (5a). Let us introduce a auxiliary function and reform the above problem as
Then by applying the wellknown augmented Lagrangian method (ALM) [38], one can get a saddle problem
We notice that the above functional with respect to is an image synthesis process and we can add a TV regularizer for to reduce some artificial effect such as blurs cased by averaging. Meanwhile, the introduction of TV can be seem as a generalization of our algorithm, since the parameter can be set to 0 which equals to the original question. Thus we get
It produces the standard ALM iteration scheme
where serves as step size.
(6a)  
(6b)  
(6c)  
(6d)  
(6e) 
The above 5 subproblems implies that we can split the mixed noise removal problem into Gaussian noise removal (renewing ), fidelity term choice (renewing ), noise putback (Lagrangian multiplier updating), noise parameters estimation ( updating), noise classification ( updating) 5 steps. In the next, we will show how to solve each subproblem.
Let us mention our model (4
) can handle many types mixture noise such as GaussianGaussian mixed noise with different standard deviations, impulse noise (salt and pepper noise and random value noise), Gaussianimpulse mixed noise and so on. One just need to chose different
to finish different types mixed noise removal problem. For GaussianGaussian mixture, would be quadratic with respect to and all of these five problem would be easily solvable.problem:
problem is a standard Gaussian noise removal problem. In order to enforce the image priori, we can employ the popular deep learning methods such as CNN as regularizer[39]. Suppose the property of is good enough such as differentiable, then must satisfy
(7) 
where is an operator with . Ideally, should be the noise for a additive noise removal problem. Thus we can use CNN to learn Gaussian noise for different kinds of levels in variety of natural images. In this sense, we can regard the CNN as a variational of a functional. In PDE denoising method, a very simple example of is the negative Laplace operator, i.e , which can be regarded as a trained single layer CNN with isotropic diffusion convolution kernel. In such a case, we can easily get the functional . Though such a simple CNN is not good enough to preserve the image edges well, it can enlighten us to use some more complicated CNN with multilayers and nonlinear kernels. In the general cases, we can not get the related closedform functional for CNN operator .
Therefore, we can simulate white Gaussian noise with different variances and put them into all kinds of natural images to produce plenty of train samples. The powerful learning ability of CNN ensure that the trained CNN can distinguish the different levels of white Gaussian noise. Once the noise in the images is identified, then the clean image can be easily recovered.
Instead of solving a linear or nonlinear PDE in traditional variational method, here we employ a CNN to find noise. Numerical experiments show that this step can greatly improve the quality of the restorations.
There are many learning base method to remove Gaussian noise such as [31][39].
In this paper, we choose the recent CNN based denoiser[39].
problem:
This subproblem would lead to a TV system and can be efficiently solved by many TV solvers, such as Chambolle dual method[40], primary dual method[41][42][43], split Bergman method [44] [45], augmented lagrangian method [38]. For the Gaussian mixture noise,
it is the ROF model. Here the weight can ensure the model assigns the different fidelity terms to the pixels contaminated by different levels or types noise. This procedure can greatly improve the quality of restorations.
problems:
These two subproblems are exactly an EM process. For a given noise , these two steps can give an estimation of noise variances and classify the noise into different classes according to the estimated noise parameters.
Some detailed noise distribution are discussed below, including GaussianGaussian type noise, impulse noise and Gaussianimpulse noise.
IiiB Gaussian Mixture Model
Assume be Gaussian functions with different variances , respectively, i.e.
then ignoring some constant terms, we can define as
(8) 
where
Here we give the Gaussian mixed noise removal model:
(9) 
Since the existence of TV term, we can adopt some splitting methods, such as split Bergman method [44] [45] and augmented lagrangian method [38]. Here we introduce an auxiliary variable and give the iteration by split Bregman
(10a)  
Furthermore,
can be solved by firstorder optimal condition, which equals to solve a linear system:
(11) 
and
can be solved by shrinkage operator[46]:
(12) 
problem (6d):
The parameter set consists of = and =, they would have closedform solutions.
The minimization problem with respect to and can be written as
(13a)  
(13b) 
With the weights constrain , one can easily get the parameters updating by:
(14) 
problem (6e):
= subproblem would also have a closedform solution. The related minimization problem becomes
It has a closedform solution
(15) 
The structure of our CNNEM algorithm is also contained in Fig.1.
IiiC Gaussian Noise Plus Impulse Noise
As for this part, we assume that the noise model is followed by mixed distribution with Gaussian noise and impulse noise , the noise model [37] can be written as
(16) 
where and
are the Gaussian noise and uniformly distributed random value range of [0, 255] named randomvalued noise or either 0 or 255 named saltand pepper noise, respectively. In such a case, one can get
Proposition 1 ([37])
The PDF of Gaussian plus randomvalued noise and Gaussian plus saltandpepper noise have the following expression respectively,
(17) 
where is a gaussian function and is the PDF of clean image with intensity of range [0, 255], which is always expressed by normalized histogram of the clean image.
Since one can use median filters to well detect saltandpepper noise, so some existing works such as twophase method [47] [48] can restore the image well even when the density of noise is as high as 90%. However, the randomvalued noise is not easy to detected and here we pay more attention on randomvalued noise.
In fact, the PDF of randomvalued noise can be expressed as
(18) 
if we suppose that the clean image has a normalized histogram, namely is an uniformly distributed PDF in . As discussed above, one can use this PDF to construct the data fidelity to complete the model. However, the second part of (18) is not differential which is hard to optimize. As discussed in the [37], in fact, this part can be well approximated by a Gaussian function, which means the model with Gaussian plus impulse noise can be optimized by the model of mixed Gaussian noise.
Iv Experimental Results
In this section, we make comparison between our proposed CNN based regularization model and some related model. Here we give 5 test images in Fig.2 uesd in our experiments: Lena (), Barbara (), Boat (), House () and Peppers (). To estimate the denoising quality of the different methods, we adopt PSNR value
as the quality index. Here and are the clean and denoised image, respectively.
Iva Gaussian Mixed Noise
In the first experiment, we give the restored results under mixed Gaussian noise. To make comparison, we take KSVD method [49], WKSVD model[37], and CNN denoising method [39] as reference.
The test image “Barbara” is corrupted by mixed Gaussian noise with mixture ratio and the standard deviations and
, respectively. Though the KSVD model is design for single Gaussian distribution, we still list the results as reference to show the superiority of EM parameter estimation. The noise variances appeared in KSVD method are set as
according to proposition 5 in the work [37]. For WKSVD [37] which integrating weighted dictionary learning and sparse coding based on Expectation Maximum(EM) process used for mixed noise removal, we update all the parameters, including weights parameter and variance . To compare with the learning based methods, we give the results by the latest work in CVPR2017 [39] called “CNN based method” here which is trained to remove additive Gaussian noise with level range . We show the noisy image and the corresponding restored results in Fig.3. We zoom in the regions in green rectangle which is placed in the leftbottom of each image patch.To be compared with the KSVD methods, one can find that some speckles exist in the results by KSVD method Fig.3(b) since KSVD can not distinguish the different noise levels, and our proposed variational model Fig.3(e) which is based on EM process with parameters updating can preciously determine the noise level of each pixels. To compare with the WKSVD methods Fig.3(c), our proposed model can preserve image detail preciously, such as texture information, since our proposed model has high level image prior so as to having better denoising performance. CNN method performs well in denoising process, however, since the noise is not a standard Gaussian distribution and we do not have the exact noise variance, if the initial value of noise level are far from the real noise level in CNN process, the restored image will be very bad. To be contrasted, our model are separated to four step including noise level estimation and image synthesis process by operator splitting, the estimation of noise can endow CNN process a better noise level estimation to have a better denoising performance Fig.3(e). Meanwhile, the CNN process largely depends on the labeled samples, if the noise distribution or noise level are not included in the sample database, the restored image are always undesirable. The image synthesis process can partly release the sampledependent effect so as to performing robust. Moreover, our proposed model has a high efficiency, TABLE I shows the CPU run times for denoising the images with mixed noise () of size 256256, 512512 by different methods, including KSVD [49], WKSVD [37], CNN [39] and proposed algorithm, here the time of WKSVD and proposed model shown in TABLE I are the time for one outer iteration by each method. It can be seen that the proposed CNNEM algorithm is more than 30 times faster than WKSVD [37].
Size  KSVD[49]  WKSVD[37]  CNN[39]  Proposed 

44.999s  492.28s/10  1.28s  13.44s/10  
128.351s  1245.70s/10  3.68s  38.09s/10 
Here we give another two numerical experiments, the images used for algorithm comparisons are corrupted by mixed noise. And the sample image “House” is contaminated by mixed Gaussian , while “Peppers” is contaminated by mixed Gaussian . Fig.4 and Fig.5 show the restored images by each methods corresponding to the artificial test images. From the restored images by each methods, one can easily get the same conclusion as the last experiment Fig.2. We test our method in every sample images, and list all the PSNR values of results by different methods in TABLE II. It can be found that almost all the restored images by proposed model have the highest PSNR values, which shows the superiority of our model. Here we pay more attention on the noise mixture and . As discussion above, the noise level to be set in CNN denoiser is , which is out of the range [0, 50] of the denoisers [39]. Under this situation, the CNN denoiser is of invalidation in fact. However, since the existence of the step of image synthesis with related to subproblem, our proposed model can behave better and more robust than the stateofart methods, which can be seen as modification of the CNN denoiser and also shows the superiority of our proposed model.
Images  =5  =10  =15  
0.3:0.7  0.5:0.5  0.7:0.3  0.3:0.7  0.5:0.5  0.7:0.3  0.3:0.7  0.5:0.5  0.7:0.3  
Lena  KSVD [49]  31.11  31.60  31.30  28.49  28.92  28.43  26.41  26.79  26.05 
WKSVD [37]  31.43  32.69  34.24  29.00  30.43  32.07  27.04  28.57  30.36  
CNN [39]  32.34  32.93  33.72  30.20  30.79  31.56  19.55  27.05  29.83  
Proposed  32.88  34.18  35.55  30.86  32.12  33.50  29.04  30.36  32.12  
Barbara  KSVD [49]  29.37  30.08  30.65  26.40  27.08  26.95  23.70  24.38  24.21 
WKSVD [37]  29.19  30.50  32.69  26.46  28.15  30.07  23.75  26.20  28.22  
CNN [39]  29.80  30.57  31.66  27.15  27.90  28.95  19.22  24.60  26.81  
Proposed  29.76  31.35  32.87  27.67  28.88  30.69  25.23  26.68  28.94  
Boat  KSVD[49]  29.08  29.63  29.78  26.20  27.07  26.81  24.61  25.02  24.60 
WKSVD [37]  28.96  29.85  31.19  26.79  27.82  29.19  25.02  26.31  27.77  
CNN [39]  29.96  30.60  31.50  27.83  28.43  29.22  19.28  25.43  27.52  
Proposed  29.99  31.66  33.14  28.29  29.43  30.92  26.34  27.65  29.39  
House  KSVD [49]  31.81  32.25  31.74  28.83  29.42  28.88  26.13  26.74  27.41 
WKSVD [37]  32.73  33.66  34.83  30.16  31.56  33.10  27.24  29.44  31.42  
CNN [39]  33.07  33.54  34.26  30.89  31.56  32.32  19.33  27.30  30.57  
Proposed  33.59  34.80  36.32  31.56  32.86  34.10  29.68  30.99  32.77  
Peppers  KSVD [49]  29.47  30.10  30.32  26.82  27.40  27.10  24.53  25.16  24.69 
WKSVD [37]  29.66  30.55  31.69  27.37  28.47  29.79  25.21  26.76  28.37  
CNN [39]  30.62  31.38  32.35  28.16  28.86  29.87  19.08  25.34  27.88  
Proposed  31.18  32.42  33.89  28.79  30.10  31.69  26.91  28.22  30.00  
Since the high calculation efficiency of our proposed algorithm, which can be tested on the image dataset, we make comparison between CNN based method [39] and our proposed algorithm on BSDS500 dataset [50]. TABLE III gives the contrast results on 100 images, 300 images and total 500 images of BSDS500 dataset. One can find our proposed algorithm have higher PSNR value. Our proposed model has at least 0.31 db improvement and 1.61db improvement for average than original CNN based method [39].
DataSet  =5  =10  =15  

0.3:0.7  0.5:0.5  0.7:0.3  0.3:0.7  0.5:0.5  0.7:0.3  0.3:0.7  0.5:0.5  0.7:0.3  
BSDS500 [50]  CNN [39]  29.27  29.99  30.97  27.08  27.68  28.47  19.17  24.76  26.77 
100 images  Proposed  29.60  30.73  32.20  27.46  28.42  29.81  25.74  26.85  28.27 
BSDS500 [50]  CNN [39]  29.28  30.00  31.00  27.06  27.67  28.49  19.22  24.76  26.75 
300 images  Proposed  29.58  30.68  32.15  27.40  28.35  29.76  25.67  26.76  28.20 
BSDS500 [50]  CNN [39]  29.21  29.92  30.91  27.03  27.62  28.42  19.23  24.76  26.71 
500 images  Proposed  29.51  30.61  32.06  27.37  28.30  29.69  25.68  26.74  28.16 
In the next experiment, we explore the relationship between PSNR values and the noise level of mixed Gaussian noise with sample image “Barbara” by proposed model and CNN based model [39]. Here, we set noise ratio = , and one of the noise with fixed standard deviations and the other noise with increasing noise level from 5 to 50 with step size 5 and give the results in Fig.6. One can find that the PSNR values of both methods are decreasing with the increasing of as we expected. Meanwhile, the PSNR values nearby the value are get closer between CNN based method and proposed method, since if , the mixed noise model will degenerate to single Gaussian noise model, which can be solved by CNN based model efficiently. with high noise level which is far from , our proposed model has more satisfactory behaviours.
In Fig.7, we give relationship between PSNR values and the mixed ratio of mixed Gaussian noise with sample image “Barbara”, and and mixed ratio: and . Here we also show the results of the CNN based method [39] which serves as contrast. In fact, with the increasing of the , the valid noise level is decreasing, which dues to the increasing of the PSNR values by both methods. Otherwise, with small ( is close to 0) or large (close to 1, which means is closer to 0), the PSNR value by CNN based method [39] and our proposed model are close to each other, since at this time, the noise in fact can be seen as single Gaussian noise, which meets our expectation.
IvB Gaussian Noise Plus Impulse Noise
In fact, our model also can work on the images with Gaussian noise plus Impulse noise, here we test our model on “Barbara” contaminated by GaussianRandom Value noise, and the density of ranomvalue noise is set , and the standard deviation of Gaussian is set . To obtain better restoration of noisy image “Barbara”, here we set the initial as the output of the first phase of twophase models [47] [48] [37] as
(19) 
where is the result of the median filter. Here we detect randomvalued impulse noise by adaptive centerweighted median filter (ACWMF) [6]. Meanwhile, we set the initial variances for impulse noise as [37]
(20) 
where can be estimated by following mode [37]
For comparison, we give the results by some related model: ACWMF [6] plus KSVD [49] (first using ACWMF to filter the noisy image, then denoising image by KSVD), twophase model [48] [36] which are two good mixed noise removal models. The parameters for these two twophase models are chosen as with the highest PSNR and noise variance is set as with in (20). The restored images by TwoPhase, ACWMF+KSVD and our proposed model are shown in Fig.8, and to have a better visualization, we zoom in the region in green rectangle which is shown in the leftbottom of Fig.8. From the restored results, our proposed model has better behaviour with no doubt, especially among the texture regions. Meanwhile, the above mentioned two methods in fact are used for single Gaussian removal, as for mixed noise, these two model can not distinguish the noise level and type, so there will be some speckles inevitably. To be contrasted, our restored image is good enough visually.
In the next experiment, we test our model on more mixed noise combination, here the “Barbara” is contaminated by different level of Gaussian noise and changing density of randomvalued noise . The results under Gauusian plus randomvalued noise by our model and the related models are shown in TABLE IV. From this table, the results by our proposed model have the highest PSNR, which is coincident with the visual results and also shows the superiority of our model.
0.1  0.2  0.3  0.1  0.2  0.3  0.1  0.2  0.3  

Noisy  18.76  15.76  14.04  18.43  15.61  13.95  17.94  15.38  13.81 
Twophase  25.40  24.77  24.13  24.34  23.94  23.45  23.32  23.02  22.67 
ACWMF+KSVD  26.07  25.27  24.51  25.50  24.91  24.31  24.64  24.19  23.77 
[36]  30.45  27.75  25.95  28.45  26.59  25.34  27.33  25.69  24.55 
Proposed  30.54  28.75  26.62  30.89  28.80  25.88  29.66  27.27  24.62 
Moreover, we test our proposed model on some real noisy images, the results are shown in Fig.9 by KSVD [49], CNN based method [39] and the proposed method, where Fig.9(a) is the real noisy brain MR image, Fig.9(b), Fig.9(c), Fig.9(d) are the denoised images by KSVD, CNN based method and proposed method respectively, the noise removed by KSVD, CNN, proposed method are shown in Fig.9(e), Fig.9(f), Fig.9(g), where the regions in green rectangle are zoomed in and placed in the leftbottom of the each subfigures. From the noise removed by different method, we can find that our propose method get better restored results than KSVD method, since there is less information removed by proposed method. Meanwhile, we find that the restored result by the proposed method is slightly better than CNN. In this experiment, the noise difference is small, as mentioned before, under this situation, the difference of results provided by CNN and proposed model would be relatively small.
V Conclusion
We propose a variational mixed noise removal model integrating the CNN deep learning regularization. The variational based fidelity is originated from the EM process treated as the estimation of noise distribution, which can measure the discrepancy between the true value and the observed data accordantly. The CNN based regularization shows better noise removal performance, since CNN can seize more image priori existing in the natural images through training process of large amount of labeled samples. To fill the gap between variational framework and nonlinear CNN regularization, we employ the wellknown operator splitting method to separate our model into four parts: noise removal (based on CNN regularization), synthesis, parameters estimation an noise classification, where each step can be optimized efficiently including CNN based denoising since the corresponding subproblem is a standard Gaussian additive model which can be solved by differnet kinds of learning based denoisers.
In fact, parameters estimation and noise classification which come from EM process play a fatal role in the CNN based noise removal step. The EM noise estimation can endow CNN process a better noise level estimation to have a better denoising behaviours. Besides, since the CNN denoiser is datadependent, if the noise distribution or noise level is not included in the sample database, the restored image is always undesirable. The image synthesis process can partly release the sampledependent effect so as to performing robust.
The key point of our model is integrating the CNN regularization into a EM based variational framework, maybe the idea can be used into more extensive image processing field, such as CNN regularization based segmentation and registration.
Acknowledgment
Jun Liu and Haiyang Huang were partly supported by The National Key Research and Development Program of China (2017YFA0604903).
References
 [1] G. Aubert and P. Kornprobst, Mathematical Problems in Image Processing: Partial Diffierential Equations and the Calculus of Variations. New York: Springer, 2002.
 [2] A. Buades, B. Coll, and J. Morel, “A review of image denoising algorithms, with a new one,” Multiscale Modeling and Simulation, vol. 4, no. 2, pp. 490–530, 2005.
 [3] D. W. V. e. a. Geusebroek J, Smeulders A W, “Fast anisotropic gauss filtering,” IEEE Transactions on Image Processing, vol. 12, no. 8, pp. 938–943, 2003.
 [4] M. Lindenbaum, M. Fischer, and A. M. Bruckstein, “On gabor’s contribution to image enhancement,” Pattern Recognition, vol. 27, no. 1, pp. 1–8, 1994.
 [5] T. Chen and H. Wu, “Space variant median filters for the restoration of impulse noise currupted images,” IEEE Transactions on Circuits and Systems Part II: Analog and Digital Signal Processing, vol. 48, pp. 784–789, 2001.
 [6] S. Ko and Y. Lee, “Center weighted median filters and their applications to image enhancement,” IEEE Transactions on Circuits and Systems, vol. 38, pp. 984–993, 1991.

[7]
V. Katkovnik, A. Foi, K. Egiazarian, and J. Astola, “Form local kernel to
nonlocal multiplemodel image denosing,”
International Journal of Computer Vision
, vol. 86, pp. 1–32, 2010. 
[8]
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Bm3d image denoising with shapeadaptive principal component analysis.” in
Proc. workshop on signal processing with adaptive sparse structured representations (SPARS 09), SaintMalo, France, 2009.  [9] J. Portilla, V. Strela, M. Wainwright, and E. Simoncelli, “Image denoising using scale mixture of gaussians in the wavelet domain,” IEEE Transactions on Image Processing, vol. 12, no. 11, pp. 1338–1351, 2001.
 [10] J. Biemond, A. M. Tekalp, and R. L. Lagendijk, “Maximum likelihood image and blur identification: a unifying approach,” Optical Engineering, vol. 29, no. 5, pp. 422–435, 1990.
 [11] J. Liu and X. Zheng, “A block nonlocal tv method for image restoration,” SIAM Journal on Imaging Sciences, vol. 10, no. 2, pp. 920–941, 2017.
 [12] Y. Meyer, “Oscillating patterns in image processing and nonlinear evolution equations: The fifteenth dean jacqueline b. lewis memorial lectures,” Of University Lecture, vol. 22, p. 122, 2001.
 [13] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, pp. 259–268, 1992.
 [14] J. Liu, H. Huang, Z. Huan, and H. Zhang, “Adaptive variational method for restoring color images with high density impulse noise,” International Journal of Computer Vision, vol. 90, no. 2, pp. 131–149, 2010.

[15]
M. Lysaker, A. Lundervold, and X.C. Tai, “Noise removal using fourthorder partial differential equation with applications to medical magnetic resonance images in space and time,”
IEEE Transactions on Image Processing, vol. 12, no. 12, pp. 1579–1590, 2003.  [16] X. Bresson and T. Chan, “Fast dual minimization of the vectorial total variation norm and applications to color image processing,” Inverse Problems and Imaging, vol. 2, no. 4, pp. 455–484, 2008.
 [17] A. Buades, B. Coll, and J. Morel, “A nonlocal algorithm for image denoising,” Computer Vision and Pattern Recognition, 2005.
 [18] G. Gilboa and S. Osher, “Nonlocal operators with applications to image processing,” SIAM: Multiscale Modeling and Simulation, vol. 7, no. 3, pp. 1005–1028, 2008.
 [19] A. Buades, B. Coll, and J. Morel, “Nonlocal image and movie denoising,” International Journal of Computer Vision, vol. 76, no. 2, pp. 123–139, 2008.
 [20] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3d transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
 [21] J. Mairal, F. R. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Nonlocal sparse models for image restoration,” vol. 0, no. 0, pp. 2272–2279, 2009.
 [22] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Jun. 2014, pp. 2862–2869.
 [23] J. Bouvrie, “Notes on convolutional neural networks,” Neural Nets, 2006.
 [24] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, p. 1527, 2006.
 [25] K. Zhang, Y. Chen, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, p. 3142, 2016.

[26]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”
Communications of the Acm, vol. 60, no. 2, p. 2012, 2013.  [27] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Computer Vision and Pattern Recognition, 2016, pp. 2414–2423.
 [28] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
 [29] V. Jain and H. S. Seung, “Natural image denoising with convolutional networks,” in International Conference on Neural Information Processing Systems, 2008, pp. 769–776.
 [30] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can plain neural networks compete with bm3d?” in Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.
 [31] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in International Conference on Neural Information Processing Systems, 2012, pp. 341–349.

[32]
M. Nikolova, “A variational approach to remove outliers and impulse noise,”
Journal of Mathematical Imaging and Vision, vol. 20, no. 1, pp. 99–120, Jan. 2004.  [33] T. M. Le, R. Chartrand, and T. J. Asaki, “A variational approach to reconstructing images corrupted by poisson noise,” Journal of Mathematical Imaging and Vision, vol. 27, no. 3, pp. 257–263, 2007.
 [34] J. Liu, Z. Huan, H. Huang, and H. Zhang, “An adaptive method for recovering image from mixed noisy data,” International Journal of Computer Vision, vol. 85, no. 2, pp. 182–191, 2009.
 [35] E. LopezRubio, “Restoration of images corrupted by gaussian and uniform impulsive noise,” Pattern Recognition, vol. 43, no. 5, pp. 1835–1846, 2010.
 [36] Y. Xiao, T. Zeng, J. Yu, and M. K. Ng, “Restoration of images corrupted by mixed gaussianimpulse noise via l1l0 minimization,” Pattern Recognition, vol. 44, no. 8, pp. 1708–1720, 2011.
 [37] J. Liu, X.C. Tai, H. Huang, and Z. Huan, “A weighted dictionary learning model for denoising images corrupted by mixed noise,” IEEE Transactions on Image Processing, vol. 22, no. 3, pp. 1108–1120, Mar. 2013.
 [38] X.C. Tai and C. Wu, “Augmented lagrangian method, dual methods and split bregman iteration for rof model,” UCLA CAM Report, Tech. Rep. 0905, 2009.
 [39] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser prior for image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017.
 [40] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of Mathematical Imaging and Vision, vol. 20, no. 1, pp. 89–97, Jan. 2004.
 [41] T. F. Chan, G. H. Golub, and P. Mulet, “A nonlinear primaldual method for total variationbased image restoration,” SIAM Journal on Scientific Computing, vol. 20, no. 6, pp. 1964–1977, 1999.
 [42] E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primaldual algorithms for convex optimization in imaging science,” SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 1015–1046, 2010.
 [43] X. Zhang, M. Burger, and S. Osher, “A unified primaldual algorithm framework based on bregman iteration,” Journal of Scientific Computing, vol. 46, no. 1, pp. 20–46, 2011.
 [44] T. Goldstein and S. Osher, “The split bregman method for l1 regularized problems,” SIAM Journal on Imaging Sciences, vol. 2, pp. 323–343, 2009.
 [45] P. Getreuer, “Rudinosherfatemi total variation denoising using split bregman,” Image Processing on Line, vol. 2, pp. 74–95, 2012.
 [46] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
 [47] J. F. Cai, R. Chan, and M. Nikolova, “Twophase methods for deblurring images corrupted by impulse plus gaussian noise,” Inverse Problems and Imaging, vol. 2, pp. 187–204, 2008.
 [48] ——, “Fast twophase image deblurring under impulse noise,” Journal of Mathematical Imaging and Vision, vol. 36, no. 1, pp. 46–53, 2009.
 [49] M. Aharon, M. Elad, and A. Bruckstein, “The ksvd: An algorithm for designing of overcomplete dictionaries for sparse representations,” IEEE Transactions on Image Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
 [50] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, May 2011. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2010.161