I Introduction
Hyperspectral (HS) imaging is an imaging modality which senses the intensities of the reflected electromagnetic waves corresponding to different wavelengths of the electromagnetic spectra, often invisible to the human eye. As the spectral response associated with an object/material is dependent on its composition, HS imaging lends itself very useful in identifying target objects/materials via their characteristic spectra or signature responses, often referred to as "endmembers" in the literature. Typical applications of HS imaging range from monitoring agricultural use of land, catchment areas of rivers and water bodies, food processing and surveillance, to detecting various minerals, chemicals, and even presence of life sustaining compounds on distant planets. However, often, these spectral signatures are highly correlated, making it difficult to detect region of interest based on these endmembers. Analysis of these characteristic responses to localize a target material/object serves as the primary motivation of this work.
Ia Model
We begin by formalizing the problem. Each HS image , is a stack of 2D images each of size , as shown in Fig. 1(a). Here, is determined by the number of frequencies or frequency bands across which we measure the reflectances. Overall, each volumetric element or voxel
, of a HS image is a vector of length
, and represents the response of the material in a pixel of the 2D grid to frequencies.As a particular scene is composed of only a small number of objects/materials, the corresponding characteristic responses of each voxel are lowrank [1]. For example, while imaging an agricultural area, we would expect to record responses from materials like biomass, farm vehicles, roads, houses and water bodies and so on. Further, the spectra of complex materials can be assumed to be a linear mixture of the constituent materials [1, 2], i.e. HS image voxels can be viewed as being generated by a linear mixture model [3]. Since, a particular scene does not contain a large number of distinct materials, we can decompose the responses into a lowrank part and a component which is sparse in a known dictionary – a dictionary sparse part.
Formally, let be formed by unfolding the HS image , such that, each column of corresponds to a voxel. Now, let arise as a result of a superposition of a lowrank component with rank , and a dictionary sparse component expressed here as , i.e.,
(1) 
Here, is an a priori known dictionary composed of normalized characteristic responses of the material/object, or of the constituents of the material we wish to localize, and is the sparse coefficient matrix (also referred to as "abundances" in the literature) with nonzero elements globally, i.e. . Note that, can also be constructed by learning a dictionary based on the known spectral signatures of a target[4, 5, 6].
IB Prior Art
The model shown in (1) is a close counterpart of a number of wellknown problems. To start, in the absence of the dictionary sparse part, (1
) reduces to the popular problem of principal component analysis (PCA)
[7]. This problem also shares its structure with variants of PCA such as robustPCA[8, 9] whereis absent, outlier pursuit
[10] where and is column sparse, and others [11, 12, 13, 14, 15, 16, 17, 18, 19]. On the other hand, the problem can be identified as that of sparse recovery [20, 21, 22], in the absence of the lowrank part . Following which, sparse recovery methods for analysis of HS images have been explored in [23, 24, 25, 26]. In addition, in a recent work [27], the authors further impose a lowrank constraint on the coefficient matrix . Further, applications of compressive sampling have been explored in [28], while [3] analyzes the case where HS images are noisy and incomplete.The model described in (1) was introduced in [29] as a means to detect traffic anomalies in a network, wherein, the authors focus on a case where the dictionary is overcomplete, i.e., fat, and the rows of are orthogonal, e.g., . Further, the coefficient matrix is assumed to possess at most nonzero elements per row and column and nonzero elements overall.
In a recent work [30], we analyze the extension of [29] to a case where the dictionary has more rows than columns, i.e., is thin, while removing the orthogonality constraint for both the thin and the fat dictionary cases, when is small. The thin dictionary case is amenable to a number of applications, as often, we are looking for only a small number of signatures. For example, in our motivating application of HS imaging, we are interested in detecting the presence of a small number of spectral signatures corresponding to the target of interest. To this end, we focus our attention on the thin case, although a similar analysis applies for the fat case [30].
Note that, when the known dictionary is thin, we can alternatively multiply (1) on the left by the pseudoinverse of , say , in which case, the model shown in (1) reduces to that of robust PCA, i.e.,
(2) 
where and . Therefore, in this case, we can adopt a twostep procedure: (1) recover the sparse matrix
by robust PCA and, (2) estimate the lowrank part using the estimate of
. ^{1}^{1}1We denote this method of estimating as RPCA in our experiments. Models considered in (1) and (2) are two alternate methods of recovering the components for thin dictionaries. These two methods inherently solve different optimization problems. The proposed method is a oneshot procedure with guarantees to solve the problem, while the conditions under which the pseudoinverse based procedure succeeds will depend on the interaction between and the lowrank part. Further, our experiments indicate that solving for model in (1) is more robust as compared to RPCA for the classification problem at hand, across different choice of dictionaries.IC Our Contribution
In this work, we present a onestep method for target detection in a hyperspectral image with the underlying model of (1). This is done by building upon the results of [30] for the underlying application. In particular, we show the conditions under which target detection is possible with thin dictionaries. In general, the method is also applicable to fat dictionaries and for other applications, which follow the underlying model. The rest of the paper is organized as follows. We formulate the problem, introduce the notation and present the theoretical result for the case at hand, i.e., the thin case, in section II. Next, we present the description of the dataset along with the experimental evaluations in section III. Finally, we conclude this discussion in section IV.
Ii Problem Formulation
Our aim is to recover the lowrank component , and the sparse coefficient matrix , given the dictionary , and samples generated according to the model described in (1). We utilize the structure of the components, and , to arrive at the following convex problem for ,
(3) 
where, denotes the nuclear norm, and refers to the  norm, which serve as convex relaxations of rank and sparsity (i.e. norm), respectively. For the application at hand, we present the theoretical result for the case when the number of dictionary elements . Analysis of a more general case can be found in [30]. For the this case, we assume the dictionary to be a frame, i.e. for any vector , we have
(4) 
where and are the lower and upper frame bounds, respectively, with .
We begin by defining a few relevant subspaces, similar to those used in [29].



denotes the case where ROC curve was “flipped” (i.e. classifier output was inverted to improve performance).
Let the pair be the solution to the problem shown in (3). We define as the space of matrices spanning either the row or the column space of the lowrank component . Specifically, let
denote the singular value decomposition of
, then the space is defined asNext, let be the space spanned by the matrices which have the same support (locations of nonzero elements) as , and let be defined as
Next, let , and be the orthogonal projection operator(s) onto the space of matrices defined above. In addition, we will use and to denote the projection matrices for the column and row spaces of , respectively, i.e., implying the following for any matrix ,
Indeed, there are situations under which we cannot hope to recover the matrices and . To characterize these scenarios, we employ various notions of incoherence, out of which, the first is the incoherence between the lowrank part, , and the dictionary sparse part, ,
where is small when these components are incoherent (which is good for recovery). The next two measures of incoherence can be interpreted as a way to avoid the cases where for , (a) resembles the dictionary , and (b) as the sparse coefficient matrix . To this end, we define respectively, the following to measure these properties,
where . Also, we define . Finally, for a matrix , we use for the spectral norm, where denotes the maximum singular value of the matrix , and .
Iia Theoretical Result
We now present our main theoretical result for the thin case, i.e., when , which was developed in [30]. We begin by introducing the following definitions and assumptions.
Definition D.1.
, where is defined as
.
Definition D.2.
.
Assumption A.1.
Assumption A.2.
Let , then
The following theorem establishes the conditions under which we can recover the pair successfully.
Theorem 1.
Consider a superposition of a lowrank matrix of rank , and a dictionary sparse component , wherein the dictionary , with , obeys the frame condition with frame bounds , the sparse coefficient matrix has at most nonzeros, i.e., , where , and , with parameters , , and . Then, solving the formulation shown in (3) will recover matrices and if assumptions A.1 and A.2 hold for any , as defined in D.1 and D.2, respectively.
In other words, Theorem 1 establishes the sufficient conditions for the existence of s to guarantee recovery of . We see that these conditions are closely related to the various incoherence measures , and , between , and . Note that, we have here an upperbound on the global sparsity, i.e., , however, our simulation results in [30] on synthetic data reveal that the recovery is indeed possible even for higher ; See [30] for the details of the analysis. To this end, we propose a randomized analysis of the problem in (3) to improve these results as a future work.
Iii Experimental Evaluation
In this section, we evaluate the performance of the proposed technique on real HS data. We begin by describing the dataset used for the simulations, following which we describe the experimental setup and conclude by presenting the results.
Iiia Data
The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [31] sensor is one of the popular choices for collecting HS images for various remote sensing applications. In this present exposition, we consider the “Indian Pines” dataset [32], which was collected over the Indian Pines test site in Northwestern Indiana in the June of 1992. This dataset consists of spectral reflectances across bands in wavelength of ranges nm from a scene which is composed mostly of agricultural land, along with two major dual lane highways, a rail line and some built structures. This dataset is further processed by removing the bands corresponding to those of water absorption, and the resulting HS datacube with dimensions is as visualized in Fig. 1. This modified dataset is available as “corrected Indian Pines” dataset [32], with the groundtruth containing classes; Henceforth, referred to as the “Dataset".
We will analyze the performance of the proposed technique for the identification of the stonesteel towers (class in the Dataset), shown as groundtruth in Fig. 2(b), which constitutes about voxels in the Dataset. We consider two cases for choosing the known dictionary – (1) when a (thin) dictionary is learned[4, 5, 6] based on the voxels (class), and (2) when the dictionary is formed by randomly sampling voxels (from class). This is to emulate the two ways in which we can arrive at the characteristics of a target – (1) the case where the exact signatures are not available, and/or there is noise, and (2) where we have access to the exact signatures we are looking for. Next, we form the data matrix by stacking each voxel of the image sidebyside, which results in a data matrix .
Normalization: For normalization, we divide the data matrix and the dictionary by to preserve the intervoxel scaling. Next, we normalize the columns of , such that they are unitnorm. In practice, we can store the unnormalized dictionary , consisting of actual signatures of the target material, and can normalize it, as described above, after the HS image has been acquired.
IiiB Experimental Setup
We evaluate and compare the performance of the proposed method (XpRA) with RPCA using transformed data (as described in section IB), matched filtering (MF) based approach on data ( denoted by MF), and a MF based approach on , denoted by MF, via receiver operating characteristic (ROC), shown in Table I(a)(d).
For the proposed technique, we employ the accelerated proximal gradient (APG) algorithm presented in [29] to solve the optimization problem shown in (3). Similarly, for RPCA we employ the APG with transformed data matrix , while setting ; See (2). With reference to selection of tuning parameters for the APG solver for XpRA (RPCA, respectively), we choose , (), , and scan through s in the range (), to generate the ROCs. The output of XpRA and RPCA is a datacube corresponding to the lowrank part and the dictionary sparse part (which contains the region of interest). We threshold the resulting estimates of the sparse part , based on their column norms. We choose the threshold such that the area under the curve (AUC) is maximized for both cases (XpRA and RPCA).
In case of MF, we form the innerproduct of the columnnormalized data matrix , say , with the dictionary , i.e., , and select the maximum absolute innerproduct per column. For MF, matched filtering corresponds to finding maximum absolute entry for each column of the columnnormalized . Next, in both cases we scan through threshold values between to generate the ROC results shown in Table I.
Table I(a)(b) show the ROC characteristics for the aforementioned methods for a dictionary learned using all voxels of class. We evaluate the performance of the aforementioned techniques for two dictionary sizes, and , for three values of the regularization parameter , used for the initial dictionary learning step, i.e., and . (The parameter controls the sparsity during the initial dictionary learning step.)
Table I(c) shows the ROC results for the case where we randomly select voxel from class
to form the known dictionary. Further, we show the overall performance in terms of the mean and standard deviation of ROC metrics (TPR, FPR and AUC), for each algorithm in Table
I(d), i.e., across different dictionaries considered in Table I(a)(c).IiiC Analysis
As described above, Table I shows the ROCs for the classification performance of the proposed method as compared to RPCA, MF and MF. We note that when we use which has been learned based on samples of class, matched filtering based approaches under perform (with an exception for and ). RPCA performs well for and . For all other choices the proposed method performs the best. Even for cases in which it is not the best, it still is comparable to the best performer. Following this, we conclude that the performance of XpRA is most reliable across different the choices of the dictionaries. This is also evident by Table I(d) which shows the mean and and standard deviation of ROC metrics (TPR, FPR and AUC). We observe that the mean and standard deviation metrics for the proposed method are superior to other methods across the different dictionary choices.
Data  
Best 

(a)  (c)  (d)  
of 

(b)  (e)  (f) 
There are other interesting recovery results which warrant our attention. For example, Fig. 2 shows the lowrank component and the dictionary sparse component recovered by the proposed method for two different values of , for the case where we form the dictionary by randomly sampling the voxels (Table I(c)). Interestingly, we recover the rail tracks/roads running diagonally on the topright corner, along with some lowdensity housing; See Fig 2 (f). This is because the signatures we seek (stonesteel towers) are similar to those of the materials used in these structures. This further corroborates the applicability of the proposed approach in detecting the presence of a particular spectral signature in a HS image. However, this also highlights potential drawback of this technique. As XpRA is based on identifying materials with similar composition, it may not be effective in distinguishing between very closely related classes, say two agricultural crops.
Iv Conclusions
We present a generalized robust PCAbased technique to localize a target in a HS image, based on the a priori known spectral signature of the material we wish to localize. We model the data as being composed of a lowrank component and a dictionary sparse component. Here, the dictionary contains the a priori known spectral signatures of the target. In this work, we adapt the theoretical results of [30] and present the conditions under which such a decomposition recovers the two components for the HS demixing task. Further, we evaluate and compare the performance of the proposed method via experimental evaluations on real HS image data for a classification task.
V Acknowledgement
The authors graciously acknowledge support from the DARPA YFA, Grant N660011414047.
References
 [1] N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE Signal Processing Magazine, vol. 19, no. 1, pp. 44–57, Jan 2002.
 [2] J. B. Greer, “Sparse demixing of hyperspectral images,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 219–228, Jan 2012.
 [3] Z. Xing, M. Zhou, A. Castrodad, G. Sapiro, and L. Carin, “Dictionary learning for noisy and incomplete hyperspectral images,” SIAM Journal on Imaging Sciences, vol. 5, no. 1, pp. 33–56, 2012.

[4]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro,
“Online Learning for Matrix Factorization and Sparse
Coding,”
Journal of Machine Learning Research
, vol. 11, pp. 19–60, 2010.  [5] B. A. Olshausen and D. J. Field, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?,” Vision Research, vol. 37, no. 23, pp. 3311–3325, 1997.
 [6] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: Design of Dictionaries for Sparse Representation,” In Proceedings of SPARS, pp. 9–12, 2005.
 [7] I. Jolliffe, Principal component analysis, Wiley Online Library, 2002.
 [8] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of the ACM (JACM), vol. 58, no. 3, pp. 11, 2011.
 [9] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Ranksparsity incoherence for matrix decomposition,” SIAM Journal on Optimization, vol. 21, no. 2, pp. 572–596, 2011.
 [10] H. Xu, C. Caramanis, and S. Sanghavi, “Robust pca via outlier pursuit,” in Advances in Neural Information Processing Systems, 2010, pp. 2496–2504.
 [11] Z. Zhou, X. Li, J. Wright, E. J. Candès, and Y. Ma, “Stable principal component pursuit,” in Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on. IEEE, 2010, pp. 1518–1522.
 [12] X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3419–3430, 2011.
 [13] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principal component pursuit,” Information and Inference, vol. 2, no. 1, pp. 32–68, 2013.
 [14] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis, “Lowrank matrix recovery from errors and erasures,” IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4324–4337, 2013.
 [15] X. Li and J. Haupt, “Identifying outliers in large matrices via randomized adaptive compressive sampling,” Trans. Signal Processing, vol. 63, no. 7, pp. 1792–1807, 2015.
 [16] X. Li and J. Haupt, “Locating salient groupstructured image features via adaptive compressive sensing,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2015.
 [17] X. Li and J. Haupt, “Outlier identification via randomized adaptive compressive sampling,” in IEEE International Conference on Acoustic, Speech and Signal Processing, 2015.
 [18] X. Li and J. Haupt, “A refined analysis for the sample complexity of adaptive compressive outlier sensing,” in IEEE Workshop on Statistical Signal Processing, 2016.
 [19] X. Li, J. Ren, Y. Xu, and J. Haupt, “An efficient dictionary based robust pca via sketching,” Technical Report, 2016.
 [20] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM Journal on Computing, vol. 24, no. 2, pp. 227–234, 1995.
 [21] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2845–2862, 2001.

[22]
E. J. Candès and T. Tao,
“Decoding by linear programming,”
IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005.  [23] Y. Moudden, J. Bobin, J. L. Starck, and J. M. Fadili, “Dictionary learning with spatiospectral sparsity constraints,” in Signal Processing with Adaptive Sparse Structured Representations(SPARS), 2009.
 [24] J. Bobin, Y. Moudden, J. L. Starck, and J. Fadili, “Sparsity constraints for hyperspectral data analysis: Linear mixture model and beyond,” 2009.

[25]
R. Kawakami, Y. Matsushita, J. Wright, M. BenEzra, Y. W. Tai, and K. Ikeuchi,
“Highresolution hyperspectral imaging via matrix factorization,”
in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2011, pp. 2329–2336.  [26] A. S. Charles, B. A. Olshausen, and C. J. Rozell, “Learning sparse codes for hyperspectral imagery,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 5, pp. 963–978, Sept 2011.
 [27] P. V. Giampouras, K. E. Themelis, A. A. Rontogiannis, and K. D. Koutroumbas, “Simultaneously sparse and lowrank abundance matrix estimation for hyperspectral image unmixing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4775–4789, Aug 2016.
 [28] M. Golbabaee, S. Arberet, and P. Vandergheynst, “Distributed compressed sensing of hyperspectral images via blind source separation,” in Forty Fourth Asilomar Conference on Signals, Systems and Computers, Nov 2010, pp. 196–198.
 [29] M. Mardani, G. Mateos, and G. B. Giannakis, “Recovery of lowrank plus compressed sparse matrices with application to unveiling traffic anomalies,” IEEE Transactions on Information Theory, vol. 59, no. 8, pp. 5186–5205, 2013.
 [30] S. Rambhatla, X. Li, and J. Haupt, “A dictionary based generalization of robust PCA,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2016.
 [31] Jet Propulsion Laboratory, NASA and California Institute of Technology, “Airborne Visible/Infrared Imaging Spectrometer,” 1987, Available at http://aviris.jpl.nasa.gov/.
 [32] M. F. Baumgardner, L. L. Biehl, and D. A. Landgrebe, “220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3, dataset available via http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes,” Sept 2015.
Comments
There are no comments yet.