Image denoising is not only an important problem in various image processing studies, but also an idea test bed for measuring the statistical image modeling methods. It has attracted a lot of research interest in the past few decades [1, 3, 6, 8, 9, 10, 11, 12, 13, 14]. Image denoising aims to estimate the latent clean image X from its noisy observation , where V is usually assumed to be additive white Gaussian noise. Due to the ill-posed nature of image denoising, it is critical to exploit the prior knowledge that characterizes the statistical features of the images.
Previous models mainly employed priors on level of pixel, such as Tikhonov regularization , total variation (TV) regularization . These methods are effective in removing the noise artifacts but smear out details and tend to oversmooth the images [4, 5].
Recently, patch-based prior have shown promising performance in image denoising. One representative example is sparse coding based scheme, which assumes that each patch of an image can be precisely represented by a sparse coefficient vector whose entries are mostly zero or close to zero[6, 11, 5, 7]. Considering that natural images are non-Gaussian and image patches are regarded as samples of a multivariate vector, Gaussian mixture models (GMMs) have emerged as favored prior for natural image patches in various image restoration studies [8, 9, 10, 11]. However, patch-based model usually suffers from some limits, such as great computational complexity, neglecting the relationship among similar patches [16, 24].
Inspired by the fact that natural images contain a large number of mutually similar patches at different locations, this so-called nonlocal self-similarity (NSS) prior was initially utilized in the work of nonlocal mean denoising. Due to its effectiveness, a large amount of further developments [11, 12, 13, 14, 4, 5, 7, 16, 24, 15] have been proposed. For instance, a very popular method is BM3D , which exploited nonlocal similar 2D image patches and 3D transform domain collaborative filtering. In , Marial advanced the idea of NSS by group sparse coding. LPG-PCA  uses nonlocal similar patches as data samples to estimate statistical parameters for PCA training. In practical, these methods belong to the category of group sparsity.
Though group sparsity has verified its great success in image denoising, all of above methods only exploited the NSS prior of noisy input image. However, only considering the NSS prior of noisy input image, it is very challenging to recover the latent clean image from noisy observation.
With the above considerations, in this work we propose a novel method for image denoising with group sparsity residual and external NSS prior. The contribution of this paper is as follows. First, to improve the performance of image denoising, we propose the concept of group sparsity residual, and thus the problem of image denoising turned into reducing the group sparsity residual. Second, due to the fact that the groups contain a large amount of NSS information of natural images, to reduce residual, we obtain a good estimation of the group sparse coefficients of the original image by the NSS prior of natural images based on Gaussian Mixture model (GMM) learning and the group sparse coefficients of noisy input image is used to approximate the estimation. Our experimental results have demonstrated that the proposed method outperforms many state-of-the-art methods. What’s more, the proposed method delivers the best qualitative denoising results with finer details and less ringing artifacts.
2.1 Group-based sparse coding
Recent studies [12, 13, 16, 15] have revealed that structured or group sparsity can offer powerful reconstruction performance for image denoising. To be concrete, given a clean image X, for each image patch x of size in X, its best matched patches are selected from a sized search window to form a group , denoted by , where denotes the -th similar patch (column vector) of the -th group. Similar to patch-based sparse coding [6, 5, 7], given a dictionary , each group can be sparsely represented as and solved by the following -norm minimization problem,
where is the regularization parameter, and characterizes the sparsity of . Then the whole image X can be represented by the set of group sparse codes .
In image denoising, each noise patch y is extracted from the noisy image Y, like in the above paragraph, we search for its similar patches to generate a group , i.e., . Thus, image denoising then is turned to how to recover from by using group sparse coding,
Once all group sparse codes are achieved, the latent clean image X can be reconstructed as .
3 Image denoising using group sparsity residual with external NSS prior
As we know, only considering the NSS prior of noisy input image, it is very challenging to recover the latent clean image from noisy observation image. In this section, to improve the performance of image denoising, we propose the concept of group sparsity residual, and thus the problem of image denoising is translated into reducing the group sparsity residual. Due to the fact that groups possess a large amount of NSS information of natural images, to reduce residual, a good estimation of the group sparse coefficients of the original image is obtained by the external NSS prior based on GMM learning and the group sparse coefficients of noisy image is used to approximate the estimation.
3.1 Group sparsity residual
Although group sparsity has demonstrated its effectiveness in image denoising, due to the influence of noise, it is very difficult to estimate the true group sparse codes B from noisy image Y. In other words, the group sparse codes A obtained by solving Eq. (2) are expected to be close enough to the true group sparse codes B of the original image X. As a consequence, the quality of image denoising largely depends on the level of the group sparsity residual, which is defined as the difference between group sparse codes A and true group sparse codes B,
Therefore, to reduce the group sparsity residual R and enhance the accuracy of A, we propose the group sparsity residual to image denoising, Eq. (2) can be rewritten as
However, it can be seen that the true group sparse codes B and are unknown because the original image X is not available. We now discuss how to obtain B and .
3.2 How to estimate the true group sparse codes B
Since the original image X is not available, it seems to be difficult to obtain the true group sparse codes B. Nonetheless, we can compute some good estimation of B. Generally speaking, there are various methods to estimate the true group sparse codes B, which depends on the prior knowledge of B we have. In recent years, patch-based or group-based priors referring to denoising operators learned from natural images achieved the state-of-the-art denoising results [6, 8, 11]. For instance, in , a dictionary learning-based method is introduced for compact patch representation, whereas in , a GMM model is learned from natural image groups based on NSS scheme and used as a prior for denoising. Due to the fact that the groups contain a rich amount of NSS information of natural images, we can achieve a good estimation of B by the NSS prior of natural images based on GMM learning.
3.2.1 Learning the NSS prior from natural images by GMM
Like in subsection 2.1, we extract groups from a given clean natural image dataset, and we denote one group as
where 111The advantage of group mean substraction is that it can further promote the NSS prior learning because the possible number of patterns is reduced, while the training samples of each pattern are increased.is the group mean substraction of each group and denotes the -th similar patch (column vector) of the -th group. Since GMM has been successfully used to model the image patch or group priors such as EPLL , PLE  and PGPD , we adopt the strategy in  and learn a finite GMM over natural image groups as group priors. By using the GMM model, the likelihood of a given group is:
where is the total number of mixture components selected from the GMM, and the GMM model is parameterized by mean vectors , covariance matrices , and mixture weights of mixture components . By assuming that all the groups are independent, the overall objective likelihood function is . Then by applying log to it, we maximize the objective function as:
We collectively represent three parameters and by , and
is learned using Expectation Maximization algorithm (EM)[8, 11, 17]. For more details about EM algorithm, please refer to .
Thus, for each noisy group 222All noisy groups are preprocessed by mean substraction. The mean of each noisy group is very close to the mean of the original group because the mean of noise V is nearly zero. Thus, the mean can be added back to the denoised group to achieve the latent clean image . of noisy input image Y, then the covariance matrix of the -th Gaussian component will turn into , where I
represents the identity matrix. The selection thatbelongs to the
-th Gaussian component can be accomplished by computing the following posterior probability,
We maximize it, and finally, the Gaussian component with the highest probability is selected to operate each group .
Then, we assume that the -th Gaussian component is selected for the group . Actually, GMM model is equivalent to the block sparse estimation with a block dictionary having blocks wherein each block corresponds to the PCA basis of one of the Gaussian components in the mixture [9, 18]. Thus, the covariance matrix of the -th Gaussian component is denoted by
. By using singular value decomposition to, we have
is an orthonormal matrix formed by the eigenvector ofand
is the diagonal matrix of eigenvalues. With the group-based GMM learning, as the statistical structures of NSS variations in natural image are captured by the eigenvectors in, and thus can be used to represent the structural variations of the groups in that component. Finally, for each group , the true group sparse code can be estimated by .
Similar to , the covariance matrix of each group is defined as and we have
where is an orthonormal matrix formed by the eigenvector of and is the diagonal matrix of eigenvalues. Thus, can be solved by .
3.3 How to determine
Besides estimating B, we also need to determine the value of . Here we perform some experiments to investigate the statistical property of R, where R denotes the set of . In these experiments, image Monarch and foreman are used as examples, where Monarch and foreman
are added by Gaussian white noise with standard deviation= 30 and = 100, respectively. We plot the histogram of R as well as the fitting Gaussian, Laplacian and hyper-Laplacian distribution in the log domain in Fig. 1(a) and Fig. 1(b), respectively. It can be seen that the histogram of R can be well characterized by the Laplacian distribution. Thus, the -norm is adopted to regularize , and Eq. (4) can be rewritten as
where , and denote the vectorization of the matrix and , respectively. Each column of the matrix denotes the vectorization of the rank-one matrix.
3.4 How to solve Eq. (11)
For fixed and , it can be seen that Eq. (11) is convex and can be solved efficiently by using some iterative thresholding algorithms. We adopt the surrogate algorithm in  to solve Eq. (11). In the -iteration, the proposed shrinkage operator can be calculated as
where is the soft-thresholding operator, represents the vectorization of the -th reconstructed group . The above shrinkage operator follows the standard surrogate algorithm, and more details can be seen in .
The parameter that balances the fidelity term and the regularization term should be adaptively determined for better denoising performance. Inspired by , the regularization parameter of each group is set as , where denotes the estimated variance of , and is a small constant.
After obtaining the solution in Eq. (12), the clean group can be reconstructed as . Then the clean image can be reconstructed by aggregating all the group .
Besides, we could execute the above denoising procedures for better results after several iterations. In the +1-th iteration, the iterative regularization strategy  is used to update the estimation of noise variance. Then the standard divation of noise in +1-th iteration is adjusted as , where is a constant.
According to the above analysis, it can be seen that the proposed model employs the group sparsity residual and external NSS prior for image denoising. The proposed denoising procedure is summarized in Algorithm 1.
|Algorithm 1: The proposed denoising algorithm|
|Input: Noisy image Y and Group-based GMM learning model.|
|Iterative regularization ;|
|For each patch y in Y do|
|Find a group for each patch y.|
|The best Gaussian component is selected by Eq. (8).|
|Constructing dictionary by Eq. (9).|
|Update computing by .|
|Constructing dictionary by Eq. (10).|
|Update computing by .|
|Update computing by .|
|Update computing by Eq. (12).|
|Get the estimation =.|
|Aggregate to form the recovered image .|
4 Experimental Results
In this section, we validate the performance of the proposed denoising algorithm and compare it with several state-of-the-art denoising methods, including BM3D , EPLL , NCSR , GID , LINE , PGPD  and aGMM . We evaluate the competing methods on 14 typical natural images, whose scene are displayed in Fig. 2. The training groups used in our experiments were sampled based on NSS scheme from the Kodak photoCD dataset333http://r0k.us/graphics/kodak/.. The detailed setting of parameters are shown in Table 1, including the number of Gaussian components 444 Since possible patterns and variables to learn can be reduced, the number of Gaussian component is not necessarily large., search window , the number of similar patches in a group , patch size and , , .
|GMM Learning Stage||Denoising Stage|
We present the average PSNR results on six noise levels =20, 30, 40, 50, 75 and 100 in Table 2. As can be seen from Table 2, the proposed method outperforms the other competing methods. It achieves 0.24dB, 0.59dB, 0.29dB, 1.30dB, 0.28dB, 0.14dB and 0.25dB improvement on average are the BM3D, EPLL, NCSR, GID, LINE, PGPD and aGMM, respectively.
The visual comparisons of competing denoising methods at noise level 40 and 75 are shown in Fig. 3 and Fig. 4, respectively. It can be seen that BM3D and LINE are resulting in over-smooth phenomena, while EPLL, NCSR, GID, PGPD and aGMM are likely to generate some undesirable ringing artifacts. By contrast, the proposed method is able to preserve the image local structures and suppress undesirable ringing artifacts more effectively than the other competing methods.
In this paper, we propose a novel method for image denoising using group sparsity residual and external NSS prior. We first propose the concept of the group sparsity residual, and thus the problem of image denoising is turned into reducing the group sparsity residual. To reduce residual, we achieve a good estimation of the group sparse coefficients of the original image by the NSS prior of natural images based on GMM learning and the group sparse coefficients of noisy input image is exploited to approximate this estimation. Experimental results have demonstrated that the proposed method can not only lead to visual improvements over many state-of-the-art methods, but also preserve much better the image local structures and generate much less ringing artifacts.
-  Tikhonov A N, Glasko V B. Use of the regularization method in non-linear problems[J]. USSR Computational Mathematics and Mathematical Physics, 1965, 5(3): 93-107.
-  Rudin L I, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms[J]. Physica D: Nonlinear Phenomena, 1992, 60(1): 259-268.
-  Zhang J, Zhao D, Xiong R, et al. Image restoration using joint statistical modeling in a space-transform domain[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6): 915-928.
Dong W, Zhang L, Shi G, et al. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization[J]. IEEE Transactions on Image Processing, 2011, 20(7): 1838-1857.
-  Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries[J]. IEEE Transactions on Image processing, 2006, 15(12): 3736-3745.
-  Dong W, Zhang L, Shi G, et al. Nonlocally centralized sparse representation for image restoration[J]. IEEE Transactions on Image Processing, 2013, 22(4): 1620-1630.
-  Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration[C]//2011 International Conference on Computer Vision. IEEE, 2011: 479-486.
-  Yu G, Sapiro G, Mallat S. Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity[J]. IEEE Transactions on Image Processing, 2012, 21(5): 2481-2499.
-  Niknejad M, Rabbani H, Babaie-Zadeh M. Image Restoration Using Gaussian Mixture Models With Spatially Constrained Patch Clustering[J]. IEEE Transactions on Image Processing, 2015, 24(11): 3624-3636.
-  Xu J, Zhang L, Zuo W, et al. Patch group based nonlocal self-similarity prior learning for image denoising[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 244-252.
-  Dabov K, Foi A, Katkovnik V, et al. Image denoising by sparse 3-D transform-domain collaborative filtering[J]. IEEE Transactions on image processing, 2007, 16(8): 2080-2095.
-  Mairal J, Bach F, Ponce J, et al. Non-local sparse models for image restoration[C]//2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009: 2272-2279.
Zhang L, Dong W, Zhang D, et al. Two-stage image denoising by principal component analysis with local pixel grouping[J]. Pattern Recognition, 2010, 43(4): 1531-1549.
-  Dong W, Shi G, Ma Y, et al. Image Restoration via Simultaneous Sparse Coding: Where Structured Sparsity Meets Gaussian Scale Mixture[J]. International Journal of Computer Vision, 2015, 114(2-3): 217-232.
-  Zhang J, Zhao D, Gao W. Group-based sparse representation for image restoration[J]. IEEE Transactions on Image Processing, 2014, 23(8): 3336-3351.
Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.
-  Sandeep P, Jacob T. Single Image Super-Resolution Using a Joint GMM Method[J]. IEEE Transactions on Image Processing, 2016, 25(9): 4233-4244.
-  Daubechies I, Defrise M, De Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint[J]. Communications on pure and applied mathematics, 2004, 57(11): 1413-1457.
-  Chang S G, Yu B, Vetterli M. Adaptive wavelet thresholding for image denoising and compression[J]. IEEE transactions on image processing, 2000, 9(9): 1532-1546.
-  Osher S, Burger M, Goldfarb D, et al. An iterative regularization method for total variation-based image restoration[J]. Multiscale Modeling & Simulation, 2005, 4(2): 460-489.
-  Talebi H, Milanfar P. Global image denoising[J]. IEEE Transactions on Image Processing, 2014, 23(2): 755-768.
-  Luo E, Chan S H, Nguyen T Q. Adaptive Image Denoising by Mixture Adaptation[J]. IEEE Transactions on Image Processing, 2016, 25(10): 4489-4503.
-  Dong W, Shi G, Li X. Nonlocal image restoration with bilateral variance estimation: a low-rank approach[J]. IEEE transactions on image processing, 2013, 22(2): 700-711.