I Introduction
Hyperspectral (HS) imaging is an imaging modality which senses the intensities of the reflected electromagnetic waves (responses) corresponding to different wavelengths of the electromagnetic spectra, often invisible to the human eye. As the spectral response associated with an object/material is dependent on its composition, HS imaging lends itself very useful in identifying the said target objects/materials via their characteristic spectra or signature responses, also referred to as endmembers in the literature. Typical applications of HS imaging range from monitoring agricultural use of land, catchment areas of rivers and water bodies, food processing and surveillance, to detecting various minerals, chemicals, and even presence of life sustaining compounds on distant planets; see [1, 2], and references therein for details. However, often, these spectral signatures are highly correlated, making it difficult to detect regions of interest based on these endmembers. In this work, we present two techniques to localize target materials/objects in a given HS image based on some structural assumptions on the data, using the a priori known signatures of the target of interest.
The primary property that enables us to localize a target is the approximate lowrankness of HS images when represented as a matrix, owing to the fact that a particular scene is composed of only a limited type of objects/materials [3]. For instance, while imaging an agricultural area, one would expect to record responses from materials like biomass, farm vehicles, roads, houses, water bodies, and so on. Moreover, the spectra of complex materials can be assumed to be a linear mixture of the constituent materials [3, 4], i.e. the received HS responses can be viewed as being generated by a linear mixture model [5]. For the target localization task at hand, this approximate lowrank structure is used to decompose a given HS image into a lowrank part, and a component that is sparse in a known dictionary – a dictionary sparse part– wherein the dictionary is composed of the spectral signatures of the target of interest. We begin by formalizing the specific model of interest in the next section.
Ia Model
A HS sensor records the response of a region, which corresponds to a pixel in the HS image as shown in Fig. 1, to different frequencies of the electromagnetic spectrum. As a result, each HS image , can be viewed as a datacube formed by stacking matrices of size , as shown in Fig. 1. Therefore, each volumetric element or voxel
, of a HS image is a vector of length
, and represents the response of the material to measurement channels. Here, is determined by the number of channels or frequency bands across which measurements of the reflectances are made.Formally, let be formed by unfolding the HS image , such that, each column of corresponds to a voxel of the datacube. We then model as arising from a superposition of a lowrank component with rank , and a dictionarysparse component, expressed as , i.e.,
(1) 
Here, represents an a priori known dictionary composed of appropriately normalized characteristic responses of the material/object (or the constituents of the material), we wish to localize, and refers to the sparse coefficient matrix (also referred to as abundances in the literature). Note that can also be constructed by learning a dictionary based on the known spectral signatures of a target; see [6, 7, 8, 9].
IB Our Contributions
In this work, we present two techniques^{1}^{1}1The code is made available at github.com/srambhatla/Diction arybasedRobustPCA. for target detection in a HS image, depending upon different sparsity assumptions on the matrix
, by modeling the data as shown in (1). Building on the theoretical results of [10, 11, 12], our techniques operate by forming the dictionary
using the a priori known spectral signatures of the target of interest, and leveraging the approximate lowrank structure of the data matrix
[13]. Here, the dictionary
can be formed from the a priori known signatures directly, or by learning an appropriate dictionary based on target data; see [6, 7, 8, 9].
We consider two types of sparsity structures for the coefficient matrix
, namely, a) global or entrywise sparsity, wherein we let the matrix
have
nonzero entries globally, and b) columnwise sparse structure, where at most
columns of the matrix
have nonzero elements. The choice of a particular sparsity model depends on the properties of the dictionary matrix
. In particular, if the target signature admits a sparse representation in the dictionary, entrywise sparsity structure is preferred. This is likely to be the case when the dictionary is overcomplete (
) or fat, and also when the target spectral responses admit a sparse representation in the dictionary. On the other hand, the columnwise sparsity structure is amenable to cases where the representation can use all columns of the dictionary. This potentially arises in the cases when the dictionary is undercomplete (
) or thin. Note that, in the columnwise sparsity case, the nonzero columns need not be sparse themselves. The applicability of these two modalities is also exhibited in our experimental analysis; see Section V for further details. Further, we specialize the theoretical results of [12], to present the conditions under which such a demixing task will succeed under the two sparsity models discussed above; see also [10] and [11].
Next, we analyze the performance of the proposed techniques via extensive experimental evaluations on realworld demixing tasks over different datasets and dictionary choices, and compare the performance of the proposed techniques with related works. This demixing task is particularly challenging since the spectral signatures of even distinct classes are highly correlated to each other, as shown in Fig. 2. The shaded region here shows the upper and lower ranges of different classes. For instance, in Fig. 2 we observe that the spectral signature of the “StoneSteel” class is similar to that of class “Wheat”. This correlation between the spectral signatures of different classes results in an approximate lowrank structure of the data, captured by the lowrank component
, while the dictionarysparse component
is used to identify the target of interest. We specifically show that such a decomposition successfully localizes the target despite the high correlation between spectral signatures of distinct classes.
IC Prior Art
The model shown in (1) is closely related to a number of wellknown problems. To start, in the absence of the dictionary sparse part
, (1
) reduces to the popular problem of principal component analysis (PCA)
[15, 16]. The problem considered here also shares its structure with variants of PCA, such as robustPCA [17, 18] (withfor an identity matrix
,) outlier pursuit
[19] (whereand
is columnwise sparse,) and others [20, 21, 22, 23, 24, 25, 26, 27, 28].
On the other hand, the problem can be identified as that of sparse recovery [29, 30, 31, 32], in the absence of the lowrank part
. Following which, sparse recovery methods for analysis of HS images have been explored in [33, 34, 35, 36]. In addition, in a recent work [37], the authors further impose a lowrank constraint on the coefficient matrix
for the demixing task. Further, applications of compressive sampling have been explored in [38], while [5] analyzes the case where HS images are noisy and incomplete. The techniques discussed above focus on identifying all materials in a given HS image. However, for target localization tasks, it is of interest to identify only specific target(s) in a given HS image. As a result, there is a need for techniques which localize targets based on their a priori known spectral signatures.
The model described in (1) was introduced in [39] as a means to detect traffic anomalies in a network, wherein, the authors focus on a case where the dictionary
is overcomplete, i.e., fat, and the rows of
are orthogonal, e.g., . Here, the coefficient matrix
is assumed to possess at most nonzero elements per row and column, and nonzero elements globally. In a recent work [10] and the accompanying theoretical work [12], we analyze the extension of [39] to include a case where the dictionary has more rows than columns, i.e., is thin, while removing the orthogonality constraint for both the thin and the fat dictionary cases, when is small. This case is particularly amenable for the target localization task at hand, since often we aim to localize targets based on a few a priori known spectral signatures. To this end, we focus our attention on the thin case, although a similar analysis applies for the fat case [10]; see also [12].
ID Related Techniques
To study the properties of our techniques, we compare and contrast their performance with related works. First, as a sanity check, we compare the performance of the proposed techniques with matched filteringbased methods (detailed in Section V). In addition, we compare the performance of our techniques to other closely related methods based on the sparsity assumptions on the matrix
, as described below.
For entrywise sparse structure: The first method we compare to is based on the observation that in cases where the known dictionary is thin, we can multiply (1) on the left by the pseudoinverse of , say , in which case, the model shown in (1) reduces to that of robust PCA, i.e.,
(RPCA) 
where and . Therefore, in this case, we can recover the sparse matrix by robust PCA [17, 18]
, and estimate the lowrank part using the estimate of
. Note that this is not applicable for the fat case due to the nontrivial null space of its pseudoinverse.Although at a first glance this seems like a reasonable technique, somewhat surprisingly, it does not succeed for all thin dictionaries. Specifically, in cases where
, the rank of
, is greater than the number of dictionary elements
, the pseudoinversed component
is no longer “lowrank.” In fact, since the notion of lowrankness is relative to the potential maximum rank of the component,
can be close to fullrank. As a result, the robust PCA model shown in RPCA is no longer applicable and the demixing task may not succeed; see our corresponding theoretical paper [12] for details.
Moreover, even in cases where RPCA succeeds (
), our proposed oneshot procedure guarantees the recovery of the two components under some mild conditions, while the pseudoinverse based procedure RPCA will require a twostep procedure – one to recover the sparse coefficient matrix and other to recover the lowrank component – in addition to a nontrivial analysis of the interaction between and the lowrank part
. This is also apparent from our experiments shown in Section V, which indicate that optimization based on the model in (1) is more robust as compared to RPCA for the classification problem at hand across different choices of the dictionaries.
For columnwise sparse structure: The columnwise sparse structure of the matrix
results in a columnwise sparse structure of the dictionarysparse component
. As a result, the model at hand is similar to that studied in OP [19]. Specifically, the OP technique is aimed at identifying the outlier columns in a given matrix. However, it fails in cases where the target of interest is not an outlier, as in case of HS data. On the other hand, since the proposed technique uses the dictionary
corresponding to the spectral signatures of the target of interest to guide the demixing procedure, it results in a spectral signaturedriven technique for target localization. This distinction between the two procedures is also discussed in our corresponding theoretical work [12] Section V, and is exemplified by our experimental results shown in Section V.
Further, as in the entrywise case, one can also envision a pseudoinverse based procedure to identify the target of interest via OP [19] on the pseudoinversed data (referred to as OP in our discussion) i.e.,
(OP) 
where and , with
admitting a columnwise sparse structure. However, this variant of OP does not succeed when the rank of the lowrank component is greater than the number of dictionary elements, i.e.,
, as in the previous case; see our related theoretical work for details [12] Section V.
The rest of the paper is organized as follows. We formulate the problem and introduce relevant theoretical quantities in Section II, followed by specializing the theoretical results for the current application in Section III. Next, in Section IV, we present the specifics of the algorithms for the two cases. In Section V, we describe the experimental setup and demonstrate the applicability of the proposed approaches via extensive numerical simulations on real HS datasets for a classification task. Finally, we conclude this discussion in Section VI.
Notation: Given a matrix
,
denotes its th column and
denotes the
element of
. We use
for the spectral norm, where
denotes the maximum singular value of the matrix,
,
, and
, where
denotes the canonical basis vector with
at the
th location and
elsewhere. Further, we denote the
norm of a vector
as
. In addition,
,
, and
refer to the nuclear norm, entrywise
 norm, and
norm (sum of the
norms of the columns) of the matrix, respectively, which serve as convex relaxations of rank, sparsity, and columnwise sparsity inducing optimization, respectively.
Ii Problem Formulation
In this section, we introduce the optimization problem of interest and different theoretical quantities pertinent to our analysis.
Iia Optimization problems
Our aim is to recover the lowrank component
and the sparse coefficient matrix
, given the dictionary
and samples generated according to the model shown in (1). Here the coefficient matrix
can either have an entrywise sparse structure or a columnwise sparse structure. We now crystallize our model assumptions to formulate appropriate convex optimization problems for the two sparsity structures.
Specifically, depending upon the priors about the sparsity structure of
, and the lowrank property of the component
, we aim to solve the following convex optimization problems, i.e.,
(DRPCA(E)) 
for the entrywise sparsity case, and
(DRPCA(C)) 
for the columnwise sparse case, to recover
and
with regularization parameters
and
, given the data
and the dictionary
. Here, the a priori known dictionary
is assumed to be undercomplete (thin, i.e.,
) for the application at hand. Analysis of a more general case can be found in[12]. Further, here “DRPCA” refers to “dictionary based robust principal component analysis”, while the qualifiers “E” and “C” indicate the entrywise and columnwise sparsity patterns, respectively.
Note that, in the columnwise sparse case there is an inherent ambiguity regarding the recovery of the true component pairs
corresponding to the lowrank part and the dictionary sparse component, respectively. Specifically, any pair
satisfying
, where
and
have the same column space, and
and
have the identical column support, is a solution of DRPCA(C). To this end, we define the following oracle model to characterize the optimality of any solution pair
.
Definition D.1 (Oracle Model for Columnwise Sparse Case).
Let the pair
be the matrices forming the data
as per (1), define the corresponding oracle model
. Then, any pair
is in the Oracle Model
, if
,
and
hold simultaneously, where
and
are projections onto the column space
of
and column support
of
, respectively.
For this case, we then first establish the sufficient conditions for the existence of a solution based on some incoherence conditions. Following which, our main result for the columnwise case states the sufficient conditions under which solving a convex optimization problem recovers a solution pair
in the oracle model.
IiB Conditions on the Dictionary
For our analysis, we require that the dictionary
follows the generalized frame property (GFP) defined as follows.
Definition D.2.
A matrix
satisfies the generalized frame property (GFP), on vectors
, if for any fixed vector
where
, we have
where
and
are the lower and upper generalized frame bounds with
.
The GFP is met as long as the vector
is not in the nullspace of the matrix
, and
is bounded. Therefore, for the thin dictionary setting
for both entrywise and columnwise sparsity cases, this condition is satisfied as long as
has a full column rank, and
can be the entire space. For example,
being a frame[40] suffices; see [41] for a brief overview of frames.
IiC Relevant Subspaces
Before we define the relevant subspaces for this discussion, we define a few preliminaries. First, let the pair
be the solution to DRPCA(E) (the entrywise sparse case), and for the columnwise sparse case, let the pair
be in the oracle model
; see Definition D.1.
Next, for the lowrank matrix
, let the compact singular value decomposition (SVD) be represented as
where
and
are the left and right singular vectors of
, respectively, and
is a diagonal matrix with singular values arranged in a descending order on the diagonal. Here, matrices
and
each have orthogonal columns. Further, let
be the linear subspace consisting of matrices spanning the same row or column space as
, i.e.,
Next, let
(
) be the space spanned by
matrices with the same nonzero support (column support, denoted as csupp) as
, and let
be defined as
Here,
denotes the index set containing the nonzero column indices of
for the columnwise sparsity case. In addition, we denote the corresponding complements of the spaces described above by appending ‘’.
We use calligraphic ‘
’ to denote the projection operator onto a subspace defined by the subscript, and ‘
’ to denote the corresponding projection matrix with the appropriate subscripts. Therefore, using these definitions the projection operators onto and orthogonal to the subspace
are defined as
and
respectively.
IiD Incoherence Measures
We also employ various notions of incoherence to identify the conditions under which our procedures succeed. To this end, we first define the incoherence parameter
that characterizes the relationship between the lowrank part
and the dictionary sparse part
, as
(2) 
The parameter
is the measure of degree of similarity between the lowrank part and the dictionary sparse component. Here, a larger
implies that the dictionary sparse component is close to the lowrank part. In addition, we also define the parameter
as
(3) 
which measures the similarity between the orthogonal complement of the columnspace
and the dictionary
.
The next two measures of incoherence can be interpreted as a way to identify the cases where for
with SVD as
: (a)
resembles the dictionary
, and (b)
resembles the sparse coefficient matrix
. In these cases, the lowrank part may resemble the dictionary sparse component. To this end, similar to [39], we define the following measures to identify these cases as
(4) 
Here,
achieves the upper bound when a dictionary element is exactly aligned with the column space
of the
, and lower bound when all of the dictionary elements are orthogonal to
. Moreover,
achieves the upper bound when the rowspace of
is “spiky”, i.e., a certain row of
is
sparse, meaning that a column of
is supported by (can be expressed as a linear combination of) a column of
. The lower bound here is attained when it is “spreadout”, i.e., each column of
is a linear combination of all columns of
. In general, our recovery of the two components is easier when the incoherence parameters
and
are closer to their lower bounds. In addition, for notational convenience, we define constants
(5) 
Here,
is the maximum absolute entry of
, which measures how close columns of
are to the singular vectors of . Similarly, for the columnwise case,
measures the closeness of columns of
to the singular vectors of under a different metric (columnwise maximal
norm).
Iii Theoretical Results
In this section, we specialize our theoretical results [12] for the HS demixing task. Specifically, we provide the main results corresponding to each sparsity structure of for the thin dictionary case considered here. We start with the theoretical results for the entrywise sparsity case, and then present the corresponding theoretical guarantees for the columnwise sparsity structure; see [12] for detailed proofs.
Iiia Exact Recovery for Entrywise Sparsity Case
For the entrywise case, our main result establishes the existence of a regularization parameter
, for which solving the optimization problem DRPCA(E) will recover the components
and
exactly. To this end, we will show that such a
belongs to a nonempty interval
, where
and
are defined as
(6) 
Here,
where
is a constant that captures the relationship between different model parameters, and is defined as
where
. Given these definitions, we have the following result for the entrywise sparsity structure.
Theorem 1.
Suppose
, where and
has at most
nonzeros, i.e.,
, and the dictionary
for
obeys the generalized frame property (2) with frame bounds
, where
, and
follows
(7) 
We observe that the conditions for the recovery of
are closely related to the incoherence measures (
,
, and
) between the lowrank part,
, the dictionary,
, and the sparse component
. In general, smaller sparsity, rank, and incoherence parameters are sufficient for ensuring the recovery of the components for a particular problem. This is in line with our intuition that the more distinct the two components, the easier it should be to tease them apart. For our HS demixing problem, this indicates that a target of interest can be localized as long as its the spectral signature is appropriately different from the other materials in the scene.
IiiB Recovery for Columnwise Sparsity Case
For the columnwise sparsity model, recall that any pair in the oracle model described in D.1 is considered optimal. To this end, we first establish the sufficient conditions for the existence of such an optimal pair
by the following lemma.
Lemma 2.
Given
,
, and
, any pair
satisfies
and
if
.
In essence, we need the incoherence parameter
to be strictly smaller than
. Next, analogous to the entrywise case, we show that
belongs to a nonempty interval
, using which solving DRPCA(C) recovers an optimal pair in the oracle model D.1 in accordance with Lemma 2. Here, for a constant ,
and
are defined as
(8) 
This leads us to the following result for the columnwise case.
Theorem 3.
Theorem 3 outlines the sufficient conditions under which the solution to the optimization problem DRPCA(C) will be in the oracle model defined in D.1. Here, for a case where
, which can be easily met by a tight frame when
, constant
, and
, we have
, which is of same order as in the Outlier Pursuit (OP) [19]. Moreover, our numerical results in [12] show that DRPCA(C) can be much more robust than OP, and may recover
even when the rank of
is high and the number of outliers
is a constant proportion of
. This implies that, DRPCA(C) will succeed as long as the dictionary
can successfully represent the target of interest while rejecting the columns of the data matrix
corresponding to materials other than the target.
Iv Algorithmic Considerations
The optimization problems of interest, DRPCA(E) and DRPCA(C), for the entrywise and columnwise case, respectively, are convex but nonsmooth. To solve for the components of interest, we adopt the accelerated proximal gradient (APG) algorithm, as shown in Algorithm 1. Note that [39] also applied the APG algorithm for DRPCA(E), and we present a unified algorithm for both sparsity cases for completeness.
Iva Background
The APG algorithm is motivated from a long line of work starting with [42], which showed the existence of a first order algorithm with a convergence rate of
for a smooth convex objective, where
denotes the iterations. Following this, [43] developed the popular fast iterative shrinkagethresholding algorithm (FISTA) which achieves this convergence rate for convex nonsmooth objectives by accelerating the proximal gradient descent algorithm using a momentum term (the term
in Algorithm 1) as prescribed by [42]. As a result, it became a staple to solve a wide range of convex nonsmooth tasks including matrix completion [44], and robust PCA [45] and its variants [39, 19]. Also, recently [46] has shown further improvements in the rate of convergence.
In addition to the momentum term, the APG procedure operates by evaluating the gradient at a point further in the direction pointed by the negative gradient. Along with faster convergence, this insight about the next point minimizes the oscillations around the optimum point; see [43] and references therein.
IvB Discussion of Algorithm 1
For the optimization problem of interest, we solve an unconstrained problem by transforming the equality constraint to a leastsquare term which penalizes the fit. In particular, the problems of interest we will solve via the APG algorithm are given by
(9) 
for the entrywise sparsity case, and
(10) 
for the columnwise sparsity case. We note that although for the application at hand, the thin dictionary case with () might be more useful in practice, Algorithm 1 allows for the use of fat dictionaries () as well.




denotes the case where ROC curve was “flipped” (i.e. classifier output was inverted to achieve the best performance).
Algorithm 1 also employs a continuation technique [45], which can be viewed as a “warm start” procedure. Here, we initialize the parameter
at some large value and geometrically reduced until it reaches a value
. A smaller choice of
results in a solution which is closer to the optimal solution of the constrained problem. Further, as
approaches zero, (9) and (10) recover the optimal solution of DRPCA(E) and DRPCA(C), respectively. Moreover, Algorithm 1 also utilizes the knowledge of the smoothness constant
(the Lipschitz constant of gradient) to set the stepsize parameter.
Specifically, the APG algorithm requires that the gradient of the smooth part,
of the convex objectives shown in (9) and (10) is Lipschitz continuous with minimum Lipschitz constant
. Now, since the gradient
with respect to
is given by
we have that the gradient
is Lipschitz continuous as
where
as shown in Algorithm 1.
The update of the lowrank component and the sparse matrix
for the entrywise case, both involve a soft thresholding step,
, where for a matrix
,
is defined as
In case of the lowrank part we apply this function to the singular values (therefore referred to as singular value thresholding) [44], while for the update of the dictionary sparse component, we apply it to the sparse coefficient matrix
.
The lowrank update step for the columnwise case remains the same as for the entrywise case. However, for the update of the columnwise case we threshold the columns of
based on their column norms, i.e., for a column
of a matrix
, the columnnorm based softthresholding function,
is defined as
IvC Parameter Selection
Since the choice of regularization parameters by our main theoretical results contain quantities (such as incoherence etc.) that cannot be evaluated in practice, we employ a gridsearch strategy over the range of admissible values for the lowrank and dictionary sparse component to find the best values of the regularization parameters. We now discuss the specifics of the gridsearch for each sparsity case.
IvC1 Selecting parameters for the entrywise case
The choice of parameters
and
in Algorithm 1 is based on the optimality conditions of the optimization problem shown in (9). As presented in [39], the range of parameters
and
associated with the lowrank part
and the sparse coefficient matrix
, respectively, lie in
and
, i.e., for Algorithm 1
.
These ranges for
and
are derived using the optimization problem shown in (9). Specifically, we find the largest values of these regularization parameters which yield a
solution for the pair
by analyzing the optimality conditions of (9). This value of the regularization parameter then defines the upper bound on the range. For instance, let
and
, then the optimality condition is given by
where the subdifferential set
is defined as
Therefore, for a zero solution pair
we have that
which yields the condition that
. Therefore, the maximum value of
which drives the lowrank part to an allzero solution is
.
Similarly, for the dictionary sparse component the optimality condition for choosing
is given by
where the the subdifferential set
is defined as
Again, for a zero solution pair
we need that
which implies that
. Meaning, that the maximum value of
that drives the dictionary sparse part to zero is
.
IvC2 Selecting parameters for the columnwise case
Again, the choice of parameters
and
is derived from the optimization problem shown in (10). In this case, the range of parameters
and
associated with the lowrank part
and the sparse coefficient matrix
, respectively, lie in
and
, i.e., for Algorithm 1
. The range of regularization parameters are evaluated using the analysis similar to the entrywise case, by analyzing the optimality conditions for the optimization problem shown in (10), instead of (9).
V Experimental Evaluation
We now evaluate the performance of the proposed technique on real HS data. We begin by introducing the dataset used for the simulations, following which we describe the experimental setup and present the results.
Va Data
Indian Pines Dataset: We first consider the “Indian Pines” dataset [14], which was collected over the Indian Pines test site in Northwestern Indiana in the June of 1992 using the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [47] sensor, a popular choice for collecting HS images for various remote sensing applications. This dataset consists of spectral reflectances across bands in wavelength of ranges nm from a scene which is composed mostly of agricultural land along with two major dual lane highways, a rail line and some built structures, as shown in Fig. 3(a). The dataset is further processed by removing the bands corresponding to those of water absorption, which results in a HS datacube with dimensions is as visualized in Fig. 1. Here, and . This modified dataset is available as “corrected Indian Pines” dataset [14], with the groundtruth containing classes; Henceforth, referred to as the “Indian Pines Dataset". We form the data matrix by stacking each voxel of the image sidebyside, which results in a
Comments
There are no comments yet.