I Introduction
Hyperspectral image (HSI) is captured from sensors over various bands and contains abundant spatial and spectral knowledge across all these bands beyond the traditional grayscale or RGB images. Due to its preservation of fullbands information under a real scene, it has been widely used in military and civilian aspects such as terrain detection, mineral exploration, pharmaceutical counterfeiting, vegetation and environmental monitoring [40, 16, 56, 10, 52, 42]. In real cases, however, a HSI is always corrupted by noises due to equipment limitations, such as sensor sensitivity, photon effects and calibration error [45]. Besides, due to the limited radiance energy and generally narrow band width, the energy captured by each sensor might be low and thus the shot noise and thermal noise then always happen inevitably. These noise severely degrades the quality of the imagery and limits the performance of the subsequent processing, i.e., classification [43], unmixing [41, 2] and target detection [46], on data. Therefore, it is a critical preprocessing step to reduce the HSI noise [39, 20] to a general HSI processing task.
The simplest denoising way is to utilize the traditional 2D or 1D denoising methods to reduce noise in the HSI pixel by pixel [17] or band by band [14, 12], [30, 44, 50]. However, this kind of processing method ignores the correlations between different spectral bands or adjacent pixels, and often results in a relatively lowquality result. Recently, more efficient methods have been proposed to handle this issue. the main idea is to elaborately encode the prior knowledge on the structure underlying a natural HSI, especially the characteristic across the spatial and spectral dimensionality. E.g., Othman and Qian [38] made an initial attempt to this issue by designing a hybrid spatialspectral derivativedomain wavelet shrinkage model based on the dissimilarity of the signal regularity existing along the space and spectrum of a natural HSI. And then Chen et al. [5]
proposed another approach to encoding both spatial and spectral knowledge by combining the bivariate wavelet thresholding with principal component analysis. To further enhance the denoising capability, Yuan et al.
[51] employed a spectralspatial adaptive total variation. Later, Chen et al. [9] proposed a spatially adaptive weighted prior by combining the smoothness and discontinuity preserving properties along the spectral domain. By further considering the spatial and spectral dependencies, Zhong and Wang [58] proposed a multiplespectralband CRF (MSBCRF) model in a unified probabilistic framework.Besides, by explicitly treating HSI data as a tensor, a series of methods that expanding waveletbased method from 2D to 3D has been proposed. E.g., Dabov
[11] designed VBM3D method by applying the concepts of grouping and collaborative filtering to video denoising. Then, Letexier et al. [28] proposed a generalized multidimensional Wiener filter for denoising hyperspectral image. Similarly, Chen et al.[6] extended Sendur and Selesnick’s bivariate wavelet thresholding from 2D image denoising to 3D data cube denoising. For better denoising results, Chen et al. [7] proposed a new signal denoising algorithm by using neighbouring wavelet coefficients, which considered both translationinvariant (TI) and nonTI versions. Later, as an extension of BM3D method, Maggioni et al. [32] presented BM4D model. Meanwhile, another type of method that based on tensor decomposition has appeared. Karami et al. [26] developed a Genetic Kernel Tucker Decomposition (GKTD) algorithm to exploit both spectral and the spatial information in HSIs. To address the uniqueness of multiple ranks of Tucker Decomposition, Liu et al. [31] proposed PARAFAC method that make the number of estimated rank reduced to one. Later, Peng et al. [39] proposed a decomposable nonlocal tensor dictionary learning (TDL) model, which fully considered the nonlocal similarity over space and the global correlation across spectrum. Among these methods, BM4D and TDL achieved the stateoftheart performance in more general noisy MSI cases.Most of current HSI denoising methods have mainly considered the HSI prior spectral and spatial knowledge into their methods, while only use
loss term to rectify the deviation between the reconstruction and the original HSI. In the viewpoint of statistical theory, the utilization of such loss term implies that the HSI data noise follows an i.i.d. Gaussian distribution. However, in real scenarios, HSI noises generally have more complicated statistical structures. This means that such easy loss term is too subjective to reflect the real cases and inclines to degenerate the performance of the methods in more complex while more realistic noni.i.d. noise case .
The idea of considering more complex HSI noise beyond only Gaussian in HSI denoising has been attracting attention very recently in the framework of lowrank matrix analysis. Since adjacent HSI bands usually exhibit strong correlations, by reshaping each band as a vector and stacking all the vectors into a matrix, the reshaped HSI matrix can be assumed to be with low rank. Various lowrank matrix models have been presented in different noise scenes in recent decades. Along this line, the classical Lowrank Matrix Factorization (LRMF) model is presented by K.Mitra et al.
[37] and T.Okatani et al. [36]for Gaussian noise, and its global optimal solution can be directly obtained by using Singular Value Decomposition (SVD)
[18]. To add more robustness, the norm LRMF [13, 15, 25, 27] is generally used and many algorithms have been designed to solve this model, such as Wiberg [15], CWM [35] and RegL1ALM [57]. This norm LRMF model actually assumes an i.i.d. Laplacian noises embedded in data. To handle more complex noise cases, Meng and De la Torre [34] modeled the noise as more adaptable i.i.d. mixture of Gaussians (MoG) distributions to represent noise. Such noise modeling idea was futher extended to the Bayesian framework by Chen et al. [8], to RPCA by Zhao et al. [55] and to the tensor factorization by Chen et al. [49] . Very recently, Cao et al. [4, cao2016PMoEP] modeled the noise as a more general i.i.d. mixture of Exponential Power (MoEP) distribution, which achieves competitive performance in HSI denoising under real noise scenarios. Besides, some other attempts have also been proposed to model the noise as a combination of sparse and Gaussian noise [1], and Zhang et al. [54] also utilized this idea for HSI denoising task. Later, He et al. [21] further enhance the capability of the method by adding a TV regularization to lowrank reconstruction. Besides, based on the mixture noise assumption, He et.al [19] proposed a noiseadjusted lowrank methods and Wang et al. [47] proposed a GLRR denoising method for the reconstruction of the corrupted HSIs. All these approaches also achieve good performance on HSI denoising in mixed noise cases.From Gaussian noise assumption to MoEP noise assumption, such advancements make the model more adaptive to various HSI noises encountered in practice. However, all of the aforementioned LRMF models just simply assume that the HSI noise is i.i.d., which is more or less deviated from the practical scenarios, where the noises in a HSI is generally with noni.i.d. configurations. In this sense, there is still a large room to further improve the capability of current HSI denoising techniques especially under real complicated scenarios.
To make this point clearer, let’s try to analyze some evident noise characteristics possessed by a HSI collected from real cases. Fig. 1 presents a real HSI case for auxiliary understanding. Firstly, an image in a band is generally collected under the similar sensor setting, and thus the i.i.d. distribution is a rational assumption for the noise over the elements in the band. This can be evidently observed from the bandnoise of the HSI image shown in Fig. 1. Secondly, due to the difference of the sensor sensitivity for collecting images located in various HSI bands, the noise of differentbandimages always depicts evident distinctions [19]. From Fig. 1, it is easy to see that images located in some bands are very clean, while are extremely noisy in some others [53, 16]. Thirdly, albeit different, the noise distributions along different bands have certain correlation. E.g., along adjacent bands, the noise tends to be more or less similar since images on neighboring spectrums are generally collected under small deviation of sensor settings and wavelength [58]. From Fig. 1, it is easy to see the noise similarity for images located in 189191 bands. Accordingly, the real HSI noise distribution is generally noni.i.d. and has a more complicated configurations than the i.i.d. noise assumption of the current HSI denoising techniques. Such deviation inclines to make their performance degenerate under more practical cases, which will be clearly observed in our experiments.
To address this issue, this paper proposes a new noise modeling framework, by carefully designing noise distribution structure to make it possibly faithfully deliver the real HSI noise configurations. Specifically, we model the noise of each HSI band with different Mixture of Gaussian (MoG) distributions (i.e., parameters of MoG are different). Besides, MoG parameters across different bands are encoded under a similar toplevel prior distribution, representing the correlation between noise distributions across different bands. In this way, the noni.i.d. noise structure under a practical HSI image can be more properly encoded and an appropriate HSI denoising effect is thus to be expected.
In this study, we embed such noise modeling framework into the lowrank matrix factorization (LRMF) realization, which easily assumes a lowrank structure (both spatial and spectral) of the toberecovered HSI. Our experimental results show that, even under this simple setting of HSI structure, such noise modeling idea can greatly help enhance the HSI denoising ability beyond previous techniques and obtain the stateoftheart performance in real HSI denoising tasks.
The rest of the paper is organized as follows: The proposed model and the corresponding variational inference algorithm are presented in Section 2. Experimental results are shown in Section 3. Finally, conclusions are drawn in Section 4. Throughout the paper, we denote scalars, vectors, matrices, tensors as nonbold letters, bold lower case letters, bold upper case letters and decorated letters, respectively.
Ii Noni.i.d. MoG method for HSI denoising
In this section, we first introduce our noni.i.d. noise encoding strategy and then propose a noni.i.d. MoG LRMF (NMoGLRMF) model by using this strategy in the LRMF model for HSI denoising. Finally, the corresponding variational inference algorithm for the proposed model is designed.
Iia Noni.i.d. noise encoding
Let’s represent a given HSI as a matrix , where and mean the spatial and spectral dimensions of the HSI, respectively. By assuming that noise is addictive, we have the following expression:
(1) 
where denotes the clean HSI and denotes the embedded noise.
In the previous section, we have summarized three intrinsic properties possessed by HSI noise: (i) Noise of an image located in the same band are i.i.d.; (ii) Noise of images located in different bands are nonidentical; (iii) Noise distributions in different bands are correlated.
Based on the above properties (i) and (ii) of HSI noise structure, we model noise located in each band as an independent MoG distribution while assume that the parameters for MoG distributions in different bands are different. The MoG is utilized due to its universal approximation capability for any continuous densities [33], which has been extensively used and studied in [34, 55].
Denote as the element located in row and column of the noise matrix , and as aforementioned, the noise distribution located in the band can then be modeled as:
(2) 
where is the mixing proportion with and , is the Gaussian component number, and are mean and precision of the Gaussian component in the band, respectively. Note that MoG parameters and are different, implying different MoG distributions across different bands.
Considering the noise property (iii), we further provide the hypothesis that MoG parameters and of all bands are generated from a twolevel prior distribution:
(3) 
where
represents the Gamma distribution. In this way, the correlation between noise distributions among different bands is then rationally encoded. Through introducing a latent variable
, we can equivalently rewrite Eq. (2) as the following form:(4) 
where and represent the Multinomial and Dirichlet distributions, respectively. Then, Eq. (3) and (4) together encode the noise structure embedded in a HSI. Fig. 2 shows the graphical model for noise encoding within the red box. All involved parameters can be inferred from data, as introduced in the following Section II.C.
IiB NMoGLRMF model
For the prior structure of the underlying clean HSI matrix , we readily employ the lowrank structure to encode it. Specifically, let’s consider the following LRMF model:
(5) 
where implies the lowrank structure underlying the clean HSI.
For most deterministic LRMF model, the rank of matrix is fixed. By modeling the problem into certain generative model [1, 55], the rank can also be adaptively learned from data. Specifically, suppose the columns of and are generated from Gaussian distribution. For , where is a preset larger value beyond the true rank , then
(6) 
where and are the columns of and , respectively. denotes the identity matrix. is the precision of and with prior as follows:
(7) 
Note that each column pair and of , has the same sparsity profile characterized by the common precision variable . It has been validated that such a modeling could lead to large precision values of some s, and hence is capable of automatically conducting lowrank estimate of [1].
IiC Variational Inference
We use variational Bayes (VB) method [3] to infer the posterior. Specifically, VB aims to use a variational distribution to approximate the true posterior , where denotes the set of parameters and denotes the observed data. To achieve this goal, we need to solve the following optimization problem:
(9) 
where represents the KL divergence between two distribution and , and
denotes the constraint set of probability densities to make the minimization tractable. Assuming
is the distribution family which can be factorized with respect to some disjoint groups: , the closedform optimal solution can be obtained by(10) 
where denotes the expectation and denotes the set of with removed.
Then we can analytically infer all the factorized distributions involved in Eq. (8). Suppose that the approximation of posterior distribution (8) possesses a factorized form as follows:
(11) 
where are the row of matrix , respectively. According to Eq. (10), we can get the closedform inference equation for each component of (11)^{1}^{1}1Inference details to obtain these equations can be referred to in http://dymeng.gr.xjtu.edu.cn..
Estimation of noise component: We first list the updating equation for the noise components involved in Eq. (11). The posterior distribution of mean and precision for noise in each band is updated by the following equation:
(12) 
where the parameters in the above equation can be calculated from data in the following way:
(13) 
(14) 
(15) 
The update equation of the latent variable is:
(16) 
where the involved parameters are calculated by
(17) 
Similarly, the update equation for the mixing proportion over the band can be written as:
(18) 
where .
The update equation on the hyperparameter is:
(19) 
where and .
Estimation of lowrank component. Completing the component update for noise, we then estimate the posterior of lowrank component and as:
(20) 
where
(21) 
where
(22) 
where . For which controls the rank of and , we have :
(23) 
where
and represent the image size among the spatial dimension. As discussed by Babacan et al. [1], some s tend to be very large during the inference process and the corresponding rows will be removed from and . The lowrank purpose can thus be rationally conducted. In all our experiments, we just automatically infer the rank of the reconstructed matrix through throwing away those comparatively very large as previous literatures did [55].
The proposed variational inference method for the NMoGLRMF model can then be summarized in Algorithm 1.
Setting of hyperparameters: We set all the hyperparameters involved in our model in a noninformative manner to make them possibly less affect the inference of posterior distributions [3]. Throughout our experiments, we set as 0, and as a small value . Our method performs stably well under such easy settings.
Iii Experimental Results
In this section, to evaluate the performance of the proposed NMoGLRMF method, we conducted a series of experiments on both synthetic and real HSI data. Compared methods include LRMR [54] and LRTV [21], considering deterministic Gaussian noise and sparse noise. Meanwhile, five representative lowrank matrix analysis methods considering different kinds of i.i.d. noise distributions were also considered for comparison, including PMoEP [4] (assuming i.i.d. mixture of Exponential Power noise), MoGRPCA [55] (assuming i.i.d. MoG noise), RegL1ALM [57], CWM [35] (assuming i.i.d. Laplace noise) and SVD [18] (assuming i.i.d. Gaussian noise). Besides, the performance of TDL [39] and BM4D [32] are also compared, and both methods represent the stateoftheart methods for HSI denoising by considering HSI priors. All experiments were implemented in Matlab R2014b on a PC with 4.0GHz CPU and 31.4GB RAM.
Iiia Simulated HSI denoising experiments
In this experiment, we focus on the performance of NMoGLRMR in HSI denoising with syntheic noise. Two HSIs were employed: Washington DC Mall^{2}^{2}2http://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html with size of and RemoteImage^{3}^{3}3http://peterwonka.net/Publications/code/LRTC_Package_Ji.zip provided by Liu et. al. [29] with size of . After cropping the main part of HSI and deleting some evident visual contaminative spectral channels, Washington DC Mall and RemoteImage are resized to and , respectively. The gray value of each band are normalized into .

Noisy HSI  SVD  RegL1ALM  CWM  MoGRPCA  PMoEP  LRMR  LRTV  TDL  BM4D  NMoG 
i.i.d. Gaussian Noise  
MPSNR 
26.02  44.76  41.19  42.89  45.26  44.26  42.48  43.83  46.08  39.33  45.27 
MSSIM  0.513  0.983  0.965  0.973  0.985  0.980  0.972  0.977  0.989  0.948  0.985 
time    0.290  221.2  851.1  59.1  201.7  5367.0  512.3  44.9  965.2  77.8 
Non i.i.d. Gaussian Noise 

MPSNR  40.96  43.34  47.15  45.00  46.15  44.09  48.26  47.19  41.80  45.50  51.11 
MSSIM  0.890  0.973  0.985  0.980  0.981  0.973  0.987  0.986  0.903  0.983  0.990 
time    0.292  212.0  800.2  121.5  1672.3  778.6  478.9  821.2  940.8  106.7 
Gaussian + Stripe Noise 

MPSNR  44.72  46.14  48.89  48.74  48.54  48.08  51.11  49.06  44.94  46.66  54.16 
MSSIM  0.941  0.989  0.992  0.992  0.992  0.990  0.994  0.993  0.942  0.985  0.997 
time    0.515  238.5  782.6  307.6  1505.5  414.4  683.2  843.1  369.5  334.7 
Gaussian + Deadline noise  
MPSNR  44.36  44.08  48.95  47.81  48.65  46.95  50.34  48.95  44.84  46.02  54.04 
MSSIM  0.938  0.980  0.993  0.990  0.991  0.986  0.991  0.993  0.943  0.979  0.997 
time    0.291  177.6  690.4  136.2  2004.7  670.8  321.2  772.3  307.5  143.6 
Gaussian + Impluse Noise  
MPSNR  43.05  44.05  49.16  47.51  49.89  46.79  51.05  49.11  43.65  45.77  52.96 
MSSIM  0.925  0.981  0.993  0.991  0.993  0.986  0.994  0.993  0.932  0.977  0.997 
time    0.293  181.3  567.4  146.5  1640.9  743.6  501.3  695.3  1028.6  150.2 
Mixture Noise  
MPSNR  42.93  43.32  48.75  44.82  47.94  44.26  48.77  48.54  43.22  43.97  50.38 
MSSIM  0.913  0.977  0.991  0.982  0.989  0.978  0.986  0.991  0.917  0.968  0.996 
time    0.289  176.5  688.7  136.3  940.3  708.2  358.9  653.8  325.4  142.0 
Noisy HSI  SVD  RegL1ALM  CWM  MoGRPCA  PMoEP  LRMR  LRTV  TDL  BM4D  NMoG  
i.i.d. Gaussian Noise  
MPSNR  26.02  38.19  36.63  36.78  38.66  36.72  37.46  37.89  39.81  38.98  38.65 
MSSIM  0.591  0.956  0.937  0.939  0.964  0.956  0.946  0.949  0.971  0.971  0.964 
time    0.247  105.9  331.6  65.7  38.1  409.1  211.9  47.7  357.7  69.6 
Non i.i.d. Gaussian Noise  
MPSNR  25.17  34.36  35.51  35.37  36.41  35.67  35.95  35.73  29.74  36.82  38.26 
MSSIM  0.539  0.902  0.921  0.922  0.942  0.928  0.926  0.927  0.729  0.952  0.962 
time    0.253  105.4  329.3  100.2  301.9  398.7  220.2  151.0  327.8  101.1 
Gaussian + Stripe Noise 

MPSNR  29.08  37.48  39.31  38.64  39.70  38.62  40.13  39.36  29.76  37.03  40.68 
MSSIM  0.710  0.955  0.968  0.964  0.974  0.965  0.971  0.968  0.731  0.938  0.983 
time    0.279  108.9  375.8  88.6  299.8  606.4  238.3  1969.9  160.4  112.3 
Gaussian + Deadline noise  
MPSNR  27.97  33.37  39.08  37.94  39.58  38.20  38.74  38.72  29.98  34.02  40.43 
MSSIM  0.693  0.907  0.967  0.961  0.972  0.963  0.958  0.965  0.772  0.888  0.983 
time    0.299  108.9  361.7  119.6  1205.9  725.1  378.9  273.5  683.6  127.5 
Gaussian + Impluse Noise  
MPSNR  28.20  35.00  41.89  39.48  41.15  39.15  41.32  40.63  30.63  36.97  42.90 
MSSIM  0.660  0.917  0.983  0.971  0.981  0.969  0.977  0.977  0.745  0.951  0.990 
time    0.236  100.0  317.2  110.1  202.8  473.1  251.6  190.7  445.3  118.1 
Mixture Noise  
MPSNR  25.84  31.55  37.93  35.72  38.68  36.09  37.76  38.31  26.93  30.29  39.32 
MSSIM  0.590  0.868  0.958  0.936  0.966  0.940  0.944  0.962  0.633  0.782  0.979 
time    0.283  122.0  416.4  69.5  650.7  696.4  260.8  312.2  86.7  105.0 
Realworld HSIs are usually contaminated by several different types of noise, including the most common Gaussian noise, impulse noise, dead pixels or lines, and stripes [54]. In order to simulate these real HSI noise scenarios, we added six kinds of noises to the original HSI data.
(1) I.i.d. Gaussian noise: Entries in all bands were corrupted by zeromean i.i.d. Gaussian noise with .
(2) Noni.i.d. Gaussian noise
: Entries in all bands were corrupted by zeromean Gaussian noise with different intensity. The signal noise ratio (SNR) value of each band is generated from uniform distribution with value in the range of
.(3) Gaussian + Stripe noise: All bands were corrupted by Gaussian noise as Case (2). Besides, 40 bands in DCmall data (30 bands in RemoteImage data) were randomly chosen to add stripe noise. The number of stripes in each band is from 20 to 40.
(4) Gaussian + Deadline noise: Each band was contaminated by Gaussian noise as Case (2). 40 bands in DCmall data (30 bands in RemoteImage data) were chosen randomly to add deadline noise. The number of deadline is from 5 to 15.
(5) Gaussian + Impluse noise: All bands were corrupted by Gaussian noise as Case (2). 40 bands in DCmall (30 bands in RemoteImage data) were randomly chosen to added impluse noise with different intensity, and the percentage of impluse is from to .
(6) Mixture noise: Each band was randomly corrupted by at least one kind of noise mentioned in case (2)(5).
Three criteria were utilized to measure performance: (1) MPSNR [24]: Mean of peak signaltonoise ratio (PSNR) over all bands between clean HSI and recovered HSI. (2) MSSIM [48]: Mean of structural similarity (SSIM) between clean HSI and recovered HSI over all bands. (3) Time: Time cost of each method used to complete the denoising process.
The parameters of competing methods are set as follows: the block size in LRMR is and the step size is 10. For LRTV, and . For TDLd, we set the block size as and the step size is 2. For BM4D, we set the block size as and the step size is 4. For NMoGLRMF method, The component number in each band was fixed as 1 in Case (1)(2), and 3 in Case (3)(6). The rank of all lowrank based methods is set as 5 in DCmall data experiment and 4 in RemoteImage data experiment. All parameters involved in the competing methods were empirically tuned or specified as suggested by the related literatures to guarantee their possibly good performance. All competing methods run with 20 random initializations in each noise case, and the average result is reported.
The results of all competing methods in DCmall and RemoteImage HSI data are shown in Table I and Table II, respectively. The superiority of the proposed method can be easily observed, except in the i.i.d. Gaussian noise case, which complies with the basic noise assumption of conventional methods. In i.i.d. Gaussian case, instead of only using the simple lowrank prior in our method, multiple competing methods, like TDL and BM4D, utilize more useful HSI priors in their model, and thus tend to have relatively better performance. While on more complex but more practical complicated noni.i.d. noise cases, the advantage of the proposed method is evident. This can be easily explained by the better noise fitting capability of the proposed method, i.e., it can more properly extract noises embedded in HSI, which then naturally leads to its better HSI recovery performance.
Furthermore, it also can be seen that the computational cost of the proposed method is with almost similar order of magnitude with other competing methods, except the known SVD, which we use the mature toolkit in Matlab and can be implemented very efficiently. Considering its better capability in fitting much wider range of noises than current methods, it should be rational to say that the proposed method is efficient.
We further show the PSNR and SSIM measurements across all bands of the HSI under six types of noise settings in two experiments in Fig. 3 and Fig. 4, respectively. From the figures, it is easy to see that TDL obtains the best PSNR and SSIM values across all bands in the i.i.d. Gaussian noise. In other more complex noise cases, NMoGLRMF achieves the best PSNR and SSIM values across almost all bands. This verifies the robustness of the proposed method over entire HSI bands.
Figs. 5 and 6 give the restoration results of two typical bands in DCmall and RemoteImage HSIs, respectively. The original HSIs are corrupted by mixture of Gaussian, stripes, deadline and impluse noise. It can be observed that the competing methods SVD, CWM, PMoEP, LRMR, and BM4D can hardly remove this complex mixture noise from the images. We can also see that although RegL1ALM, MoGRPCA and LRTV have a better denoising performance, the restored HSI still relatively blur many details as compared with the ground truth, which are also reflected from their PSNR and SSIM values in the band. Comparatively, our NMoGLRMF method achieves better reconstruction effect both visually and quantitatively, and both finely removes the unexpected noises, and better recovers HSI details like edges and textures.
IiiB Real HSI denoising experiments
In this section, we evaluate the performance of the proposed method on two real HSI datasets, Urban dataset^{4}^{4}4http://www.tec.army.mil/hypercube and EO1 Hyperion dataset^{5}^{5}5http://datamirror.csdb.cn/admin/dataEO1Main.jsp.
The similar competing methods as the last section were compared in the real HSI experiments. Like the synthetic experiments, all involved parameters have been finely tuned or directly suggested by the original references to promise a possibly well performance of all competing methods. The component number of each band in the proposed method is easily set as throughout all experiments. The gray value of HSI were normalized into .
The first Urban dataset is of the size , and some bands are seriously polluted by atmosphere and water. We use all of data without removing any bands to more rationally verify the robustness of the proposed method in the presence of such heavy noises.
Figs. 7, 8 and 9 present bands 103, 139 and 207 of the restored images obtained by all competing methods, respectively. From Fig. 79 (a), we can see that the original HSI bands are contaminated by complex structural noise, like the stripe noise. It can be easily observed from Fig. 79 (b)(f) that the LRMF methods with different i.i.d. noise distribution assumptions cannot finely restore a clean HSI. This can be easily explained by the fact that such structural noises is obvious noni.i.d. and the deviation between the real noise configuration and the encoded knowledge in the model then naturally degenerate the performance of the corresponding methods. Comparatively, albeit considering less prior knowledge on the toberecovered HSI, our method can still achieve a better recovery effect in visualization in its better restoration of texture and edge details and less preservation of structural noises due to its powerful noise modeling ability. This further substantiates the robustness of the proposed method in practical scenarios.
Then we give some quantitative comparison by showing the horizontal mean profiles of band 207 in Urban dataset before and after restoration in Fig. 10. The horizontal axis in the figure represents the row number, and the vertical axis represents the mean digital number value of each row. As shown in Fig. 10(a), due to the existence of mixed noise, there are rapid fluctuations in the curve. After the restoration processing, the fluctuations are more or less suppressed. It is easy to observe that the restoration curve of our NMoGLRMF method provides evidently smoother curves.
The second Hyperion dataset is of the size and some bands are so seriously polluted that all signatures of pixel is equal 0. Thus we only use a subset of its bands in this experiment after removing those zero signatures bands , , and 
, which are evidently outliers without any intrinsically useful information. We further spatially cropped a square area from the HSI with size
with relatively evident noises to specifically testify the denoising capability of a utlized method.Figs. 11, 12 and 13 show bands 100, 144 and 197 of the restored HSI images by all competing methods. From the figures, it can be easily observed that SVD and TDL methods have little effects in removing the noise; RegL1ALM and CWM methods can only partially eliminate the noise and MoGRPCA and PMoEP methods have a better performance than SVD, RegL1ALM and CWM due to their better noise fitting ability brought by the MoG and MoEP models. They, however, still miss lots of details in their HSI recovery due to their implicit improper i.i.d. assumptions on noise distributions under HSI. All other competing methods also cannot achieve a satisfactory HSI denoising results even though they have considered more HSI prior knowledge in their models. Comparatively, the superiority of the proposed method can be easily observed in both detail preservation and noise removing.
Fig. 14 shows the vertical mean profiles of band 100 before and after restoration. As shown in Fig. 14 (a), due to the existence of mixed noise especially the stripes disorderly located in the image, there are evident fluctuations over various places of the curve. After the restoration processing, all competing methods show evident nonsmoothness over the corresponding curves, except that obtained from the proposed one. This implies that the HSI obtained by proposed NMoGLRMF method can better preserve such prior knowledge possessed by real HSIs, as also substantiated by Fig. 10.
IiiC Effect of component number on denoising performance
In this section, we examine the sensitivity of NMoGLRMF method to the setting of the Gaussian component number . We run NMoGLRMF with 20 initializations on the DC Mall data in Gaussian impluse noise and mixture noise cases, respectively, with varying from to . The results are shown in Fig. 15. It can be easily observed that after is larger than , the denoising performance of the proposed method tends to be stable and not very sensitive to the choise of value. Actually, in all our real experiments, we just simply set as , and our method can consistently perform well throughout all our experiments.
Iv Conclusion
In this paper, we initially propose a strategy to model the HSI noise using a noni.i.d. noise assumption. Then we embed such noise modeling strategy into the lowrank matrix factorization (LRMF) model and propose a noni.i.d LRMF model under the Bayesian framework. A variational Bayes algorithm is presented to infer the posterior of the proposed model. Compared with the current stateofthearts techniques, the proposed method performs more robust due to its capability of adapting to various noise shapes encountered in applications, which is substantiated by our experiments implemented on synthetic and real noisy HSIs.
In our future work, we will try to extend the application of our method to video and face image data. Also, we will integrate more useful HSI prior terms into our model to further enhance its denoising capability. Besides, the proposed noise modeling strategy can be specifically redesigned under certain application context, like the wind speed prediction problem as indicated in [23, 22].
References
 [1] S. D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos. Sparse bayesian methods for lowrank matrix estimation. IEEE Transactions on Signal Processing, 60(8):3964–3977, 2012.
 [2] J. M. BioucasDias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regressionbased approaches. IEEE Transactions on Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):354–379, 2012.
 [3] C. M. Bishop. Pattern recognition and machine learning. Springer, 2006.

[4]
X. Cao, Y. Chen, Q. Zhao, D. Meng, Y. Wang, D. Wang, and Z. Xu.
Lowrank matrix factorization under general mixture noise
distributions.
In
Proceedings of the IEEE International Conference on Computer Vision
, pages 1493–1501, 2015.  [5] G. Chen and S.E. Qian. Simultaneous dimensionality reduction and denoising of hyperspectral imagery using bivariate wavelet shrinking and principal component analysis. Canadian Journal of Remote Sensing, 34(5):447–454, 2008.
 [6] G. Chen and S.E. Qian. Denoising of hyperspectral imagery using principal component analysis and wavelet shrinkage. IEEE Transactions on Geoscience and Remote Sensing, 49(3):973–980, 2011.
 [7] G. Chen and W.P. Zhu. Signal denoising using neighbouring dualtree complex wavelet coefficients. IEEE Transactions on Signal Processing, 6(2):143–147, 2012.
 [8] P. Chen, N. Wang, N. L. Zhang, and D.Y. Yeung. Bayesian adaptive matrix factorization with automatic model selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
 [9] S.L. Chen, X.Y. Hu, and S.L. Peng. Hyperspectral imagery denoising using a spatialspectral domain mixing prior. Computer Science and Technology, 27(4):851–861, 2012.
 [10] Y. Chen, N. M. Nasrabadi, and T. D. Tran. Hyperspectral image classification using dictionarybased sparse representation. IEEE Transactions on Geoscience and Remote Sensing, 49(10):3973–3985, 2011.
 [11] K. Dabov, A. Foi, and K. Egiazarian. Video denoising by sparse 3d transformdomain collaborative filtering. In Proceedings of the IEEE Conference on Signal Processing, pages 145–149. IEEE, 2007.
 [12] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3d transformdomain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095, 2007.
 [13] C. Ding, D. Zhou, X. He, and H. Zha. R 1pca: rotational invariant l 1norm principal component analysis for robust subspace factorization. In Proceedings of the 23rd International Conference on Machine Learning, pages 281–288. ACM, 2006.
 [14] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
 [15] A. Eriksson and A. Van Den Hengel. Efficient computation of robust lowrr 1pca: rotational invariant l 1norm principal component analysis for robust subspace factorizationank matrix approximations in the presence of missing data using the l 1 norm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 771–778. IEEE, 2010.
 [16] A. F. Goetz. Three decades of hyperspectral remote sensing of the earth: A personal view. Remote Sensing of Environment, 113:S5–S16, 2009.
 [17] A. A. Green, M. Berman, P. Switzer, and M. D. Craig. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience and Remote Sensing, 26(1):65–74, 1988.
 [18] M. Haardt and P. Strobach. Method for highresolution spectral analysis in multi channel observations using a singular valve decomposition (svd) matrix technique, Oct. 1 1996. US Patent 5,560,367.
 [19] W. He, H. Zhang, L. Zhang, and H. Shen. Hyperspectral image denoising via noiseadjusted iterative lowrank matrix approximation. IEEE Transactions on Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):3050–3061, 2015.
 [20] W. He, H. Zhang, L. Zhang, and H. Shen. Totalvariationregularized lowrank matrix factorization for hyperspectral image restoration. IEEE Transactions on Geoscience and Remote Sensing, 54(1):178–188, 2016.
 [21] W. He, H. Zhang, L. Zhang, and H. Shen. Totalvariationregularized lowrank matrix factorization for hyperspectral image restoration. IEEE Transactions on Geoscience and Remote Sensing, 54(1):178–188, 2016.
 [22] Q. Hu, Y. Wang, Z. Xie, P. Zhu, and D. Yu. On estimating uncertainty of wind energy with mixture of distributions. Energy, 112:935–962, 2016.
 [23] Q. Hu, S. Zhang, Z. Xie, J. Mi, and J. Wan. Noise model based support vector regression with its application to shortterm wind speed forecasting. Neural Networks, 57:1–11, 2014.
 [24] Q. HuynhThu and M. Ghanbari. Scope of validity of psnr in image/video quality assessment. Electronics letters, 44(13):800–801, 2008.
 [25] H. Ji, C. Liu, Z. Shen, and Y. Xu. Robust video denoising using low rank matrix completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1791–1798. IEEE, 2010.
 [26] A. Karami, M. Yazdi, and A. Z. Asli. Noise reduction of hyperspectral images using kernel nonnegative tucker decomposition. IEEE Transactions on Journal of Selected Topics in Signal Processing, 5(3):487–493, 2011.
 [27] Q. Ke and T. Kanade. Robust l 1 norm factorization in the presence of outliers and missing data by alternative convex programming. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 739–746. IEEE, 2005.
 [28] D. Letexier and S. Bourennane. Noise removal from hyperspectral images by multidimensional filtering. IEEE Transactions on Geoscience and Remote Sensing, 46(7):2061–2069, 2008.
 [29] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):208–220, 2013.
 [30] L. Liu, L. Chen, C. P. Chen, Y. Y. Tang, and C. M. Pun. Weighted joint sparse representation for removing mixed noise in image. 2016.
 [31] X. Liu, S. Bourennane, and C. Fossati. Denoising of hyperspectral images using the parafac model and statistical performance analysis. IEEE Transactions on Geoscience and Remote Sensing, 50(10):3717–3724, 2012.
 [32] K. E. M. Maggioni, V. Katkovnik and A. Foi. Nonlocal transformdomain ﬁlter for volumetric data denoising and reconstruction. IEEE Transactions on Image Processing, 22(1):119–133, 2013.
 [33] V. Maz’ya and G. Schmidt. On approximate approximations using gaussian kernels. IMA Journal of Numerical Analysis, 16(1):13–29, 1996.
 [34] D. Meng and F. Torre. Robust matrix factorization with unknown noise. In Proceedings of the IEEE International Conference on Computer Vision, pages 1337–1344, 2013.

[35]
D. Meng, Z. Xu, L. Zhang, and J. Zhao.
A cyclic weighted median method for l1 lowrank matrix factorization
with missing entries.
In
Proceedings of the Association for the Advance of Artificial Intelligence
, volume 4, page 6, 2013.  [36] K. Mitra, S. Sheorey, and R. Chellappa. Largescale matrix factorization with missing data under additional constraints. In Advances in Neural Information Processing Systems, pages 1651–1659, 2010.
 [37] T. Okatani, T. Yoshida, and K. Deguchi. Efficient algorithm for lowrank matrix factorization with missing components and performance comparison of latest algorithms. In Proceedings of the 28st International Conference on Machine Learning, pages 842–849. IEEE, 2011.
 [38] H. Othman and S.E. Qian. Noise reduction of hyperspectral imagery using hybrid spatialspectral derivativedomain wavelet shrinkage. IEEE Transactions on Geoscience and Remote Sensing, 44(2):397–408, 2006.
 [39] Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang, and B. Zhang. Decomposable nonlocal tensor dictionary learning for multispectral image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2949–2956, 2014.
 [40] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. CampsValls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, et al. Recent advances in techniques for hyperspectral image processing. Remote Sensing of Environment, 113:S110–S122, 2009.
 [41] Y. Qian, S. Jia, J. Zhou, and A. RoblesKelly. Hyperspectral unmixing via sparsityconstrained nonnegative matrix factorization. IEEE Transactions on Geoscience and Remote Sensing, 49(11):4282–4297, 2011.
 [42] Y. Qian and M. Ye. Hyperspectral imagery restoration using nonlocal spectralspatial structured sparse representation with noise estimation. IEEE Transactions on Selected Topics in Applied Earth Observations and Remote Sensing, 6(2):499–515, 2013.
 [43] P. Sathya and K. Vani. Hyperspectral image classification. Advances in Natural and Applied Sciences, 9(6 SE):338–344, 2015.

[44]
L. Shao, R. Yan, X. Li, and Y. Liu.
From heuristic optimization to dictionary learning: a review and comprehensive comparison of image denoising algorithms.
IEEE Transactions on Cybernetics, 44(7):1001–1013, 2014.  [45] T. Skauli. Sensor noise informed representation of hyperspectral data, with benefits for image storage and processing. Optics Express, 19(14):13031–13046, 2011.
 [46] D. W. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaum, and A. D. Stocker. Anomaly detection from hyperspectral imagery. IEEE Transactions on Signal Processing, 19(1):58–69, 2002.
 [47] M. Wang, J. Yu, J.H. Xue, and W. Sun. Denoising of hyperspectral images using group lowrank representation. IEEE Transactions on Selected Topics in Applied Earth Observations and Remote Sensing, 2016, Accepted.
 [48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. Image Processing, IEEE Transactions on, 13(4):600–612, 2004.
 [49] Y. W. Q. Z. D. M. Xi’ai Chen, Zhi Han and Y. Tang. Robust tensor factorization with unknown noise. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
 [50] R. Yan, L. Shao, and Y. Liu. Nonlocal hierarchical dictionary learning using wavelets for image denoising. IEEE Transactions on Image Processing, 22(12):4689–4698, 2013.
 [51] Q. Yuan, L. Zhang, and H. Shen. Hyperspectral image denoising employing a spectral–spatial adaptive total variation model. IEEE Transactions on Geoscience and Remote Sensing, 50(10):3660–3677, 2012.
 [52] P. W. Yuen and M. Richardson. An introduction to hyperspectral imaging and its application for security, surveillance and target acquisition. The Imaging Science Journal, 58(5):241–253, 2010.
 [53] A. C. Zelinski and V. K. Goyal. Denoising hyperspectral imagery and recovering junk bands using wavelets and sparse approximation. In Proceedings of the IEEE International Conference on Geoscience and Remote Sensing Symposium, pages 387–390. IEEE, 2006.
 [54] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan. Hyperspectral image restoration using lowrank matrix recovery. Proceedings of the IEEE Transactions on Geoscience and Remote Sensing, 52(8):4729–4743, 2014.
 [55] Q. Zhao, D. Meng, Z. Xu, W. Zuo, and L. Zhang. Robust principal component analysis with complex noise. In Proceedings of the 31st International Conference on Machine Learning, pages 55–63, 2014.
 [56] Y.Q. Zhao, P. Gong, and Q. Pan. Object detection by spectropolarimeteric imagery fusion. IEEE Transactions on Geoscience and Remote Sensing, 46(10):3337–3345, 2008.
 [57] Y. Zheng, G. Liu, S. Sugimoto, S. Yan, and M. Okutomi. Practical lowrank matrix approximation under robust l 1norm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1410–1417. IEEE, 2012.
 [58] P. Zhong and R. Wang. Multiplespectralband crfs for denoising junk bands of hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 51(4):2260–2275, 2013.
Comments
There are no comments yet.