I Introduction
An airborne hyperspectral imaging sensor is capable of simultaneously acquiring the same spatial scene in a contiguous and multiple narrow spectral wavelength (color) bands [1, 2, 3]. When all the spectral bands are stacked together, the resulting hyperspectral image (HSI) is a three dimensional data cube; each pixel in the HSI is a
dimensional vector,
, where designates the total number of spectral bands. With the rich information afforded by the high spectral dimensionality, hyperspectral imagery has found many applications in various fields such as agriculture [4, 5], mineralogy [6], military [7, 8, 9], and in particular, target detection [1, 2, 10, 11, 7, 12, 13]. In many situations of practical interest, we do not have sufficient a priori information to specify the statistics of the target class. More precisely, the target’s spectra is not provided to the user. This unknown target is referred as « anomaly » [14] having a very different spectra from the background (e.g., a ship at sea).Different Gaussianbased anomaly detectors have been proposed in the literature [15, 16, 17, 18, 19]. The detection performance of these detectors mainly depend on the true unknown covariance matrix (of the background surrounding the test pixel) whose entries have to be carefully estimated specially in large dimensions. Due to the fact that in hyperspectral imagery, the number of covariance matrix parameters to estimate grows with the square of the spectral dimension, it becomes impractical to use traditional covariance estimators where the target detection performance can deteriorate significantly. Many a time, the researchers assume that compounding the large dimensionality problem can be alleviated by leveraging on the assumption that the true unknown covariance matrix is sparse, namely, many entries are zero.
This paper outlines two simple methods based on preexisting works in order to impose sparsity on the covariance matrix via its Cholesky factor . The first method imposes sparsity by exploiting thresholding operators such as Soft and SCAD on the OLS estimate of . The second method directly estimates a sparse version of by penalizing the negative normal loglikelihood with and SCAD penalty functions.
Summary of Main Notations: Throughout this paper, we depict vectors in lowercase boldface letters and matrices in uppercase boldface letters. The notation stands for the transpose, while , , , , and are the absolute value, the inverse, the derivative, the determinant, and indicator function, respectively. For any , we define if , if and if .
Ii Background and System Overview
Suppose that we observe a sample of independent and identically distributed random vectors,
, each follows a multivariate Gaussian distribution with zero mean and unknown covariance matrix
. The first traditional estimator we consider in this paper is the Sample Covariance Matrix (SCM), defined as .In order to address the positivity definiteness constraint problem of , Pourahmadi [20]
has modeled the covariance matrices via linear regressions. This is done by letting
, and consider each element , , as the linear least squares predictor of based on its predecessors . In particular, for , let(1) 
where is a unit lower triangular matrix with in the th position for and , and is a diagonal matrix with as its diagonal entries, where is the prediction error for . Note that for , let , and hence, . Given a sample , with , a natural estimate of and , denoted as and
in this paper, is simply done by plugging in the OLS estimates of the regression coefficients and residual variances in (1), respectively. In this paper, we shall denote the second traditional estimator by
.Obviously, when the spectral dimension is considered large compared to the number of observed data , both and face difficulties in estimating without an extreme amount of errors. Realizing the challenges brought by the high dimensionality, researchers have thus circumvent these challenges by proposing various regularization techniques to consistently estimate based on the assumption that the covariance matrix is sparse.
Recently, Bickel et al. [21] proposed a banded version of , denoted as in this paper, with , where is the banding parameter. However, this kind of regularization does not always guarantee positive definiteness.
In [22], a class of generalized thresholding operators applied on the offdiagonal entries of
have been discussed. These operators combine shrinkage with thresholding and have the advantage to estimate the true zeros as zeros with high probability. These operators (e.g., Soft and SCAD), though simple, do not always guarantee positive definiteness of the thresholded version of
. In [23], the covariance matrix is constrained to have an eigen decomposition which can be represented as a sparse matrix transform (SMT) that decomposes the eigendecomposition into a product of very sparse transformations. The resulting estimator, denoted as in this paper, is always guaranteed to be positive definite.In addition to the above review, some other works have attempted to enforce sparsity on the covariance matrix via its Cholesky factor . Hence, yielding sparse covariance estimators that are always guaranteed to be positive definite. For example, in [24], Pourahmadi et al. proposed to smooth the first few subdiagonals of and set to zero the remaining subdiagonals. In [25], Huang et al. proposed to directly estimate a sparse version of by penalizing the negative normal loglikelihood with a norm penalty function. Hence, allowing the zeros to be irregularly placed in the Cholesky factor. This seems to be an advantage over the work in [24].
We put forth two simple methods for imposing sparsity on the covariance matrix via its Cholesky factor . The first method is based on the work in [22], but attempts to render sparse by thresholding its Cholesky factor using operators such as Soft and SCAD. The second method aims to generalize the work in [25] in order to be used for various penalty functions.
The two methods allow the zeros to be irregularly placed in the Cholesky factor.
Clearly, in real world hyperspectral imagery, the true covariance model is not known, and hence, there is no prior information on its degree of sparsity.
However, enforcing sparsity on the covariance matrix seems to be a strong assumption, but can be critically important if the true covariance model for a given HSI is indeed sparse. That is, taking advantage of the possible sparsity in the estimation can potentially improve the target detection performance, as can be seen from the experimental results later.
On the other hand, while the true covariance model may not be sparse (or not highly sparse), there should be no worse detection results than to those of the traditional estimators and .
We evaluate our estimators for hyperspectral anomaly detection using the Kelly anomaly detector [26]. More precisely, we first perform a thorough evaluation of our estimators on some MonteCarlo simulations for three true covariance models of different sparsity levels. From our experiments in Subsection IVA, the detection results show that in trully sparse models, our estimators improve the detection significantly with respect to the traditional ones, and have competitive results with stateoftheart [21, 22, 23]. When the true model is not sparse, we find that empirically our estimators still have no worse detection results than to those of and .
Next, in Subsection IVB, our estimators are evaluated on experimental data where a good target detection performances are obtained comparing to the traditional estimators and stateoftheart.
In all the experiments later, we observe that achieves higher target detection results than to those of .
Iii Main Contributions
Before describing the two methods, we want to recall the definition for . Given a sample , we have:
(2) 
By writing (2) in vectormatrix form for any , one obtains the simple linear regression model:
(3) 
where , , , and .
When , the OLS estimate of and the corresponding residual variance are plugged in and for each , respectively. At the end, one obtains the estimator . Note that has  in the th position for and .
Iiia Generalized thresholding based Cholesky Factor
For any , we define a matrix thresholding operator and denote by the matrix resulting from applying a specific thresholding operator Soft, SCAD to each element of the matrix for and .
We consider the following minimization problem:
(4) 
where .
We have
= , and = .
Solving (4) with and , yields a closedform Soft and SCAD thresholding rules, respectively [22], [27].
The value was recommended by Fan and Li [27]. Despite the application here is different than in [27], for simplicity, we use the same value throughout the paper.
We shall designate the two thresholded matrices by and , that apply Soft and SCAD on , respectively.
We denote our first two estimators as:
Models  

Model 1  0.9541  0.7976  0.8331  0.9480  0.9480  0.9509  0.9509  0.9503  0.9509  0.9509  0.9509 
Model 2  0.9540  0.7977  0.8361  0.9124  0.9124  0.9264  0.9264  0.9184  0.9478  0.9274  0.9270 
Model 3  0.9541  0.7978  0.8259  0.8169  0.8257  0.8236  0.8261  0.7798  0.5321  0.5969  0.5781 
MUSE  Not known  0.6277  0.6575  0.9620  0.9643  0.8844  0.8844  0.7879  0.9277  0.7180  0.7180 
Table 1. A List of Area Under Curve (AUC) values of our estimators , , , when compared to some others.
Note that in [22], the authors have demonstrated that for a non sparse true covariance model, the covariance matrix does not suffer any degradation when thresholding is applied to the offdiagonal entries of . However, this is not the case for the target detection problem where the inverse covariance is used; we found that, and in contrast to our estimators, the scheme in [22] has a deleterious effect on the detection performance when compared to those of and .
IiiB A generalization of the estimator in [25]
We present the same concept in [25], but by modifying the procedure by which the entries of have been estimated.
Note that and . It follows that . Hence, the negative normal loglikelihood of , ignoring an irrelevant constant, satisfies:
.
By adding a penalty function to , where (see subsection III. A) with , we have:
(5) 
Obviously, minimizing (5) with respect to and gives the solutions and , respectively.
It remains to estimate the entries of by minimizing (5) with respect to .
From equation (2) and (3), the minimization problem to solve for each is:
(6) 
By denoting and , we solve iteratively using the General Iterative Shrinkage and Thresholding (GIST) algorithm [28]:
(7) 
where , and is the step size initialized using the BarzilaiBrowein rule [29].
By decomposing (7) into independent (t1) univariate optimization problems, we have for :
(8) 
where .
By solving (8) with the norm penalty, , we have the following closed form solution:
(9) 
For the SCAD penalty function, , we can observe that it contains three parts for three different conditions (see Subsection IIIA). In this case, by recasting problem (8) into three minimization subproblems for each condition, and after solving them, one can obtain the following three subsolutions , , and , where:
,
,
.
Hence, we have the following closed form solution:
(10) 
At the end, we denote our last two estimators as:
where and have respectively and in the th position for and , whereas has the entries on its diagonal. Note that in [25], the authors have used the local quadratic approximation (LQA) of the norm in order to get a closed form solution for in equation (6). Our algorithm is now more general since after exploiting the GIST algorithm to solve (6), it can be easily extended to some other penalties such as SCAD [27] , CappedL1 penalty [30, 31, 32], Log Sum Penalty[33], Minimax Concave Penalty [34] etc. and they all have closedform solutions [28]. In this paper, we are only interested to the and SCAD penalty functions.
Iv hyperspectral anomaly detection
We first describe the Kelly anomaly detector [26] used for the detection evaluation. Next, we present two subsections to gauge the detection performances of our estimators {, , , } when compared to the traditional ones {, } and stateoftheart: [23], [21], and the two estimators that apply Soft and SCAD thresholding on the offdiagonal entries of in [22], and which will be denoted in the following experiments as and , respectively. Note that the tuning parameter (in subsection IIIA) and (in subsection IIIB) are chosen automatically using a 5fold crossvalidated loglikelihood procedure (see Subsection 4.2 in [25] for details).
Suppose the following signal model:
(11) 
where are i.i.d
vectors, each follows a multivariate Normal distribution
. is an unknown steering vector and which denotes the presence of an anomalous signal with unknown amplitude . After some calculation (refer to [26] and both Subsection II. B and Remark II. 1 in [35] for details), the Kelly anomaly detector is described as follows:(12) 
where is a prescribed threshold value. In the following two subsections, the detection performances of the estimators, when are plugged in are evaluated by the Receiver Operating Characteristics (ROC) curves and their corresponding Area Under Curves (AUC) values.
Iva MonteCarlo simulations
The experiments are conducted on three covariance models:

Model 1:
= I, the identity matrix,

Model 3: , where , for : the triangular matrix.
Model 1 is very sparse and model 2 is approximately sparse. Model 3 with is considered the least sparse [22] among the three models we consider.
The computations have been made through MonteCarlo trials and the ROC curves are drawn for a signal to noise ratio equal to 15dB. We choose for covariance estimation under Gaussian assumption, and set = 60. The artificial anomaly we consider is a vector containing normally distributed pseudorandom numbers (to have fair results, the same vector is used for the three models). The ROC curves for Model 1, 2 and 3 are shown in Fig. 1(a), 1(b) and 1(c), respectively, and their corresponding AUC values are presented in Table 1.
For a clear presentation of the figures, we only exhibit the ROC curves for , , , and .
The highest AUC values are shown in bold in Table 1. For both Model 1 and 2, our estimators significantly improve the detection performances comparing to those of the traditional estimators (, ), and have competitive detection results with stateoftheart. An important finding is that even for a non sparse covariance model (that is, Model 3) , our estimators do not show a harm on the detection when compared to those of , . Despite , and have slightly lower AUC values than for , this is still a negligible degradation on the detection. Thus, considering that , and have no worse detection results than to that of is still acceptable.
IvB Application on experimental data
Our estimators are now evaluated for galaxy detection on the Multi Unit Spectroscopic Explorer (MUSE) data cube (see [36]). It is a 100 100 image and consists of 3600 bands in wavelengths ranging from 465930 nm. We used one band of each 60, so that 60 bands in total.
Figure 2(a) exhibits the mean power in dB over the 60 bands.
The covariance matrix is estimated using a sliding window of size , having secondary data (after excluding only the test pixel). The mean has been removed from the given HSI. Figure 2(b) exhibits the ROC curves [37] of our estimators when compared to some others, and their AUC values are shown in Table 1. Note that the curves for , and are not drawn but their AUC values are shown in Table 1.
The estimators , achieve higher detection results than for all the others,
whereas both and achieve only a lower AUC values than for .
V Conclusion
Two methods are outlined to impose sparsity on the covariance matrix via its Cholesky factor . Some MonteCarlo simulations as well as experimental data demonstrate the effectiveness (in terms of anomaly detection) of the two proposed methods using the Kelly anomaly detector.
References
 [1] G. Shaw and D. Manolakis, “Signal processing for hyperspectral image exploitation,” IEEE Signal Processing Magazine, vol. 19, no. 1, pp. 12–16, Jan 2002.
 [2] D. Manolakis, D. Marden, and G. Shaw, “Hyperspectral image processing for automatic target detection applications,” Lincoln Laboratory Journal, vol. 14, no. 1, pp. 79–116, 2003.
 [3] D. G. Manolakis, R. B. Lockwood, and T. W. Cooley, Hyperspectral Imaging Remote Sensing: Physics, Sensors, and Algorithms. Cambridge University Press, 2016.
 [4] N. K. Patel, C. Patnaik, S. Dutta, A. M. Shekh, and A. J. Dave, “Study of crop growth parameters using airborne imaging spectrometer data,” International Journal of Remote Sensing, vol. 22, no. 12, pp. 2401–2411, 2001.
 [5] B. Datt, T. R. McVicar, T. G. van Niel, D. L. B. Jupp, and J. S. Pearlman, “Preprocessing eo1 hyperion hyperspectral data to support the application of agricultural indexes,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, pp. 1246–1259, June 2003.
 [6] Hörig, B. and Kühn, F. and Oschütz, F. and Lehmann, F. Pearlman, “HyMap hyperspectral remote sensing to detect hydrocarbons,” International Journal of Remote Sensing, vol. 22, pp. 14131422, May 2001.
 [7] D. Manolakis and G. Shaw, “Detection algorithms for hyperspectral imaging applications,” Signal Processing Magazine, IEEE, vol. 19, no. 1, pp. 29–43, 2002.
 [8] D. W. J. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaum, and A. D. Stocker, “Anomaly detection from hyperspectral imagery,” IEEE Signal Processing Magazine, vol. 19, no. 1, pp. 58–69, Jan 2002.
 [9] M. T. Eismann, A. D. Stocker, and N. M. Nasrabadi, “Automated hyperspectral cueing for civilian search and rescue,” Proceedings of the IEEE, vol. 97, no. 6, pp. 1031–1055, June 2009.
 [10] D. Manolakis, E. Truslow, M. Pieper, T. Cooley, and M. Brueggeman, “Detection algorithms in hyperspectral imaging systems: An overview of practical algorithms,” IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 24–33, Jan 2014.
 [11] D. Manolakis, R. Lockwood, T. Cooley, and J. Jacobson, “Is there a best hyperspectral detection algorithm?” Proc. SPIE 7334, p. 733402, 2009.
 [12] J. FronteraPons, F. Pascal, and J. P. Ovarlez, “Falsealarm regulation for target detection in hyperspectral imaging,” in Computational Advances in MultiSensor Adaptive Processing (CAMSAP), 2013 IEEE 5th International Workshop on, Dec 2013, pp. 161–164.
 [13] J. FronteraPons, M. A. Veganzones, S. VelascoForero, F. Pascal, J. P. Ovarlez, and J. Chanussot, “Robust anomaly detection in hyperspectral imaging,” in 2014 IEEE Geoscience and Remote Sensing Symposium, July 2014, pp. 4604–4607.
 [14] S. Matteoli, M. Diani, and G. Corsini, “A tutorial overview of anomaly detection in hyperspectral images,” Aerospace and Electronic Systems Magazine, IEEE, vol. 25, no. 7, pp. 5–28, 2010.
 [15] I. S. Reed and X. Yu, “Adaptive multipleband CFAR detection of an optical pattern with unknown spectral distribution,” IEEE Transactions on Acoustics Speech and Signal Processing, vol. 38, pp. 1760–1770, Oct. 1990.
 [16] E. J. Kelly, “An adaptive detection algorithm,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES22, no. 2, pp. 115–127, March 1986.
 [17] C.I. Chang and S.S. Chiang, “Anomaly detection and classification for hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 6, pp. 1314–1325, Jun 2002.
 [18] J. C. Harsanyi, Detection and classification of subpixel spectral signatures in hyperspectral image sequences. Ph.D. dissertation, Dept. Electr. Eng., Univ. Maryland, Baltimore, MD, USA, 1993.
 [19] J. FronteraPons, F. Pascal, and J. P. Ovarlez, “Adaptive nonzeromean gaussian detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 1117–1124, Feb 2017.
 [20] M. Pourahmadi, “Joint meancovariance models with applications to longitudinal data: unconstrained parameterisation,” Biometrika, vol. 86, no. 3, pp. 677–690, 1999.
 [21] P. J. Bickel and E. Levina, “Regularized estimation of large covariance matrices,” The Annals of Statistics, vol. 36, no. 1, pp. 199–227, 2008.
 [22] A. J. Rothman, E. Levina, and J. Zhu, “Generalized thresholding of large covariance matrices,” Journal of the American Statistical Association, vol. 104, no. 485, pp. 177–186, 2009.

[23]
G. Cao and C. Bouman, “Covariance estimation for high dimensional data vectors using the sparse matrix transform,” in
Advances in Neural Information Processing Systems 21. Curran Associates, Inc., 2009, pp. 225–232.  [24] W. B. Wu and M. Pourahmadi, “Nonparametric estimation of large covariance matrices of longitudinal data,” Biometrika, vol. 90, no. 4, p. 831, 2003. [Online]. Available: +http://dx.doi.org/10.1093/biomet/90.4.831
 [25] J. Z. Huang, N. Liu, M. Pourahmadi, and L. Liu, “Covariance matrix selection and estimation via penalised normal likelihood,” Biometrika, vol. 93, no. 1, pp. 85–98, 2006.
 [26] E. J. Kelly, “An adaptive detection algorithm,” Aerospace and Electronic Systems, IEEE Transactions on, vol. 23, no. 1, pp. 115–127, November 1986.
 [27] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
 [28] P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye, “Gist: General iterative shrinkage and thresholding for nonconvex sparse learning,” 2013.
 [29] J. Barzilai and J. M. Borwein, “Twopoint step size gradient methods,” IMA Journal of Numerical Analysis, vol. 8, no. 1, pp. 141–148, 1988.
 [30] T. Zhang, “Analysis of multistage convex relaxation for sparse regularization,” J. Mach. Learn. Res., vol. 11, pp. 1081–1107, Mar. 2010. [Online]. Available: http://dl.acm.org/citation.cfm?id=1756006.1756041

[31]
——, “Multistage convex relaxation for feature selection,”
Bernoulli, vol. 19, no. 5B, pp. 2277–2293, 11 2013. [Online]. Available: http://dx.doi.org/10.3150/12BEJ452  [32] C. Z. P. Gong, J. Ye, “Multistage multitask feature learning,” in Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012, pp. 1988–1996.
 [33] E. Candès, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted l1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.
 [34] C.H. Zhang, “Nearly unbiased variable selection under minimax concave penalty,” Ann. Statist., vol. 38, no. 2, pp. 894–942, 04 2010. [Online]. Available: http://dx.doi.org/10.1214/09AOS729
 [35] J. FronteraPons, M. A. Veganzones, F. Pascal, and J. P. Ovarlez, “Hyperspectral anomaly detectors using robust estimators,” Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of, vol. 9, no. 2, pp. 720–731, Feb 2016.
 [36] official website of the MUSE Project, “http://muse.univlyon1.fr/.”
 [37] Y. Zhang, B. Du, and L. Zhang, “A sparse representationbased binary hypothesis model for target detection in hyperspectral images,” IEEE TGRS, vol. 53, no. 3, pp. 1346–1354, March 2015.
Comments
There are no comments yet.