Hyperspectral Image Classification and Clutter Detection via Multiple Structural Embeddings and Dimension Reductions

06/03/2015 ∙ by Alexandros-Stavros Iliopoulos, et al. ∙ 0

We present a new and effective approach for Hyperspectral Image (HSI) classification and clutter detection, overcoming a few long-standing challenges presented by HSI data characteristics. Residing in a high-dimensional spectral attribute space, HSI data samples are known to be strongly correlated in their spectral signatures, exhibit nonlinear structure due to several physical laws, and contain uncertainty and noise from multiple sources. In the presented approach, we generate an adaptive, structurally enriched representation environment, and employ the locally linear embedding (LLE) in it. There are two structure layers external to LLE. One is feature space embedding: the HSI data attributes are embedded into a discriminatory feature space where spatio-spectral coherence and distinctive structures are distilled and exploited to mitigate various difficulties encountered in the native hyperspectral attribute space. The other structure layer encloses the ranges of algorithmic parameters for LLE and feature embedding, and supports a multiplexing and integrating scheme for contending with multi-source uncertainty. Experiments on two commonly used HSI datasets with a small number of learning samples have rendered remarkably high-accuracy classification results, as well as distinctive maps of detected clutter regions.



There are no comments yet.


page 6

page 7

page 9

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We are concerned in this paper with analysis of hyperspectral imaging (HSI) data; in particular, we address the task of high-accuracy multi-class labeling, as well as clutter detection as a necessary complement.

Enabled by advanced sensing systems, such as the NASA/JPL AVIRIS [11], NASA Hyperion [23], and DLR ROSIS [13] sensors, hyperspectral imaging, also known as imaging spectroscopy, pertains to the acquisition of high-resolution spectral information over a broad range, providing substantially richer data than multi-spectral or color imaging. HSI combines spectral with spatial information, as samples are collected over large areas, at increasingly fine spatial resolution. With its rich information provision and non-invasive nature, HSI has become an invaluable tool for detection, identification, and classification of materials and objects with complex compositions. Relevant application fields include material science, agriculture, environmental and urban monitoring, resource discovery and monitoring, food safety and security, and medicine [4, 21, 14]. As sensing technologies continue to advance, HSI is providing larger collections of data to facilitate and enable scientific and engineering inquiries that were previously unfeasible. At the same time, it challenges many existing data analysis methods to render high-quality results commensurate with the richness of available information in HSI data.

Among the key challenging factors for HSI data analysis are: the curse of dimensionality of the spectral feature space, which hampers class discrimination (Hughes effect 

[15]) and exacerbates the computational complexity of the analysis process; strong and nonlinear spatio-spectral correlations and mixing across spectral bands, as well as cross-mixing between spatial pixels and spectral bands [3]; and multiple sources of noise and uncertainty with regard to the imaged scene and acquisition process [1].

A host of data analysis approaches have been investigated for use in HSI classification [3]. We may roughly categorize them according to the feature space where classification takes place and whether or not the corresponding models are linear. For example, band selection and linear combination techniques for classification [16, 19, 26]

reduce the dimensionality of the spectral attribute space based on linear signal and image models. Kernel-based classifiers, such as SVMs 

[20, 6], respect nonlinearity, applying a nonlinear transform to the data attributes and embedding them in a high-dimensional classification space. While such methods can be effective with certain data, they can be sensitive to the chosen embedding kernel and the number or distribution of available training samples. A different approach is that of manifold learning methods [24, 28, 22, 21, 1], where high-dimensional embedding of the data samples is followed by dimensionality reduction. Such methods assume that a (principal) manifold structure underlies the collected data samples; subsequent analyses are then based on the isometric principles associated with manifolds. Another important assumption is that the features lie in a well-defined metric space; manifold learning methods are sensitive to the choice of metric for neighborhood definition, as well as to the density and distribution of data samples. Indeed, a naive application of such an approach to HSI data may suffer from the high correlation and various uncertainty sources in the hyperspectral attribute space. It should be noted, additionally, that some algorithms incur too high a computational cost for them to be practical for HSI data analysis, even more so as the spatial coverage and resolution of hyperspectral sensors is increasing.

We address here the aforementioned standing issues in HSI classification: (i) nonlinear correlation and irregular singularities, (ii) multiple-source uncertainties with respect to the HSI data structure, and (iii) high data dimensionality. The first problem is in part responsible for an existing gap between HSI data collection and analysis: while spectral and spatial information is coupled in HSI scenes, it is typically processed in a decoupled manner. From an alternative perspective, the strong correlation in HSI data can be exploited to help overcome the other two challenges. In our approach, we start with exploring and utilizing the spatial and spectral coherence of HSI data in tandem. There are various methods that attempt to incorporate spatial coherence [10, 27, 17, 21] in the analysis process; these approaches can be seen as special or extreme cases in the framework we introduce in this paper.

There are three key components in our framework for HSI classification and clutter detection: (i) The Locally Linear Embedding (LLE) method of Roweis and Saul [24] provides the basic computational procedure for deriving a manifold representation of the data; we review LLE in section 2 and comment on our interpretation, our rationale for its selection, and its specific form within our method. (ii) Prior to the LLE computations, we embed the HSI samples to a structural feature space using efficient, local filters to highlight their spatio-spectral structure, thus exposing potential discriminatory singularities and contending with noise in the data, while avoiding de-correlation; we describe the feature embedding concept and its connection to the LLE processing in section 3. (iii) We consider an ensemble of structural embeddings and representations, defined by multiple parameter instances for the other two components, to counteract the effect of multiple uncertainties; we describe in section 4 the relevant ensemble parameters, as well as our scheme for multiplexing and integrating the results over all instances.

Experimental results with our approach are presented in section 5

. They demonstrate evidently high-accuracy classification and clutter detection. Indeed, the estimated clutter maps we extract appear to be the first of their kind in the context of HSI classification. Clutter areas shape boundaries and delineate coherent, labeled regions; they may also contain objects of interest or new classes to be analyzed, and may be of higher value to various data analysis applications. We consider clutter maps, such as the ones presented in this paper, as critical information that complements classification in the traditional sense. The joint provision of classification and clutter detection estimates serves to make HSI data analysis independent of artificial or impractical conditions, and impacts the rendering of higher quality, interpretable analysis results.

2 The LLE method for classification

The core processing module for HSI structure encoding and classification in our approach is the Locally Linear Embedding (LLE) method of Roweis and Saul [24]. The basic assumption behind it is that a set of data samples in a high-dimensional space of observable attributes is distributed over an underlying low-dimensional manifold; LLE may then be used to map the data samples to the principal manifold coordinate space, or parameter space. This assumption conforms well to HSI data, owing to their non-linear, correlated structure, as per the physical laws of radiative transfer and sensor properties and calibration [1, 3], whereas direct use of linear dimension reduction models is ill-suited for HSI data analysis.

LLE has rendered surprisingly good results in classification or clustering of synthetic data samples on low-dimensional manifolds (e.g. Swiss roll) and certain image data (such as handwritten digits and facial pose or illumination) [24, 8]. Several theoretical interpretations and algorithmic extensions have been proposed for LLE [9, 2], and it is increasingly applied to domain-specific data analysis tasks. HSI classification ranks among such tasks [21, 18], albeit scarcely.

In this work, we adopt LLE as a core procedure for HSI classification due to three of its remarkable properties: (i) the natural connection between a globally connected embedding of local geometric structures and sparse coding; (ii) the translation invariance of local geometry encoding and its preservation by dimensionality reduction; and (iii) the strikingly simple and computationally efficient algorithmic structure. We briefly describe the LLE processing steps and remark on certain aspects based on our interpretation.

Let , where , be a set of samples in a -dimensional feature space. First, a set of neighboring samples, denoted by , is located for every sample, . We employ the -nearest neighbors (

NN) scheme because of its relative insensitivity to sample density; our measure for neighborhood definition is based on angular (cosine) similarity.

The local geometry around each sample point,

, is then encoded by a vector of local coefficients (weights). These coefficients place

at the neighborhood barycenter and the corresponding vector is numerically orthogonal to the tangent plane spanned by its neighbors about the center. Specifically, the local weights, , are determined by the following local least squared problem, subject to the affine combination condition:


for all . The affine combination not only makes the sample point the neighborhood barycenter, but also means that the local encoding is translation invariant.

Equation 1 may be rewritten in matrix form as


where is the Frobenius norm,

is the identity matrix,

is the constant-1 vector, and is an matrix, .

Once is computed, the left singular vectors, , corresponding to the (

) smallest singular values of

are obtained:


where is the reduced dimensionality. The low-dimensional representation, , of the data samples preserves local geometry and global connectivity as encoded in .

Finally, a classifier is employed to label the data in the low-dimensional manifold parameter space. We use a simple nearest-neighbor classifier to investigate the efficacy of the embedding and dimension reduction process with respect to classification.

A few additional remarks: The sparsity pattern of the weight matrix, , is determined by the NN search in the first step, while the corresponding numerical values of are determined via eq. 1 in a local, column-wise independent fashion. More importantly, , as per eq. 2

, encodes the global interconnection of local hyperplanes via the transitive property of neighborhood connections, without entailing the explicit, computationally expensive calculation of all pairwise shortest connection paths. The

matrix can also be seen as a simple kernel-based embedding. The low-dimensional space spanned by includes constant-valued vectors, corresponding to the zero singular value, whose geometric multiplicity may be greater than . The discriminatory information lies in the -dimensional subspace that is orthogonal to the constant vector, .

3 Feature space embedding

HSI data samples are known to be strongly correlated in their spectral signatures [4, 19, 1]. Strong correlation between features complicates the choice of a discriminatory distance or similarity metric, particularly so in a high-dimensional setting. Furthermore, nonlinearity and high dimensionality render de-correlation attempts ineffective. Increasing the learning sample density is impractical and may yield limited improvements; learning from sparse reference sample subsets is desired, instead.

We take a novel approach, namely structural feature embedding, to alleviate these fundamental issues. We explore the spatio-spectral coherence structure of HSI data, and embed the spectral attribute space in a structure-rich space, where data-specific features may be made more salient. Then, a conventional distance metric in the embedding feature space may be seen as an ad hoc discriminatory one in the original attribute space. Moreover, the computational complexity for structural feature embedding scales linearly with the dataset size, which is much more efficient than that of even linear de-correlation.

Specifically, we explore spatial and spectral coherence by using a bank of filters. Formally, the filters define a set of basis (or transform) functions, , such that the embedded data become


where each basis, , is local with respect to the spatial and/or spectral domain of the HSI dataset, . Thus, the embedded feature space may be efficiently computed, removing certain noise components while preserving the underlying manifold structure. The distance or similarity between any two samples is then measured in the embedded feature space.

Feature transformation and embedding directly impact the metric for neighborhood definition and subsequent encoding of local geometry. A closely related notion with respect to the spatial properties of the HSI is the spatially coherent distance function introduced by Mohan et al. [21], where it is proposed that distance calculations be performed using all features in a local, ordered patch around each pixel. Here, we introduce the notion of feature embedding as a basic mechanism for effecting a data-specific geometric metric by means of a conventional metric, thus circumventing the explicit definition of new, complicated metrics. Note, for example, that employing the patch-based spatially coherent distance of Mohan et al. is equivalent to applying a box filter to each HSI band prior to distance calculations—except that the latter is insensitive to the particular ordering of pixels within the patch, making similarity discovery more robust with respect to local composition variations and object boundaries.

In general, the feature transform basis functions, or simply filters, can be divided into two groups: generic ones that may be useful to any HSI analysis task, and data- or analysis-specific filters, depending on one’s objective. The filters can be also grouped according to their geometric and statistical features. We consider two particular types of spectral filters: differential and integral. Differential filters elucidate local characteristics of the spectral signature of each sample, and generally down-weigh spurious similarity contributions induced by correlation between consecutive spectral bands. Integral filters, on the other hand, may be used to extract statistical, noise-insensitive properties of spectral signatures.

This embedding mechanism allows us to probe the HSI data at different scales, depending on the support and order of the spatial or spectral filters; hence, the hyperspectral data are embedded in a feature space that captures their structure at the relevant scale. In the experiments carried out in this paper, we use spatial box filtering, and extend the spectral features with their numerical gradient and first two statistical moments (mean and standard deviation).

A few remarks are in order on the computation of local neighborhoods. Obtaining the local neighborhoods, , which directly affect the estimated manifold structure and parameters, amounts to computation of all -nearest neighbors sets among the hyperspectral samples. This starts to become problematic as the size of the HSI increases, due to the high computational cost of NN searching in the high-dimensional embedded (or original) feature space. Based on the spatial coherence of HSIs—and given that the size of each local neighborhood should be relatively small for the approximately linear structure assumption to hold in its vicinity—we circumvent this issue by bounding the search for spectral neighbors within an ample spatial window centered around each pixel. [Owing to the sparsity of the LLE matrix, we may still generate the full, connected system of local hyperplanes and the consequent low-dimensional representation of the dataset, without needing to resort to tiling and local-coordinate transformations [1].]

Dataset Sensor Spectral domain () Spatial domain #classes Labeled area coverage
range resolution #bands #pixels resolution
Indian Pines AVIRIS [11] 410– 2450 10 220 145 145 200 16 49.4%
Univ. of Pavia ROSIS [13] 430– 860 4 103 610 340 1.7 9 20.6%
Table 1: HSI dataset summary.

4 Structural algorithm ensemble

As has already been mentioned, there are multiple sources that introduce variations and uncertainty to the underlying HSI manifold structure. To name a few, such variations may stem from scattering, atmospheric conditions, spectral mixing of material constituents, etc [1, 3]. Another related issue is that HSI samples pertaining to different compounds may be distributed inhomogeneously along the manifold surface. The introduction of uncertainty from a diverse set of sources to the observed HSI attributes means that the sample manifold will tend to exhibit multi-scale structure. These considerations motivate us to probe the HSI data at different scales in order to uncover the underlying structure.

The derived HSI representation depends on several parameters in all stages of the embedding and dimension reduction procedure, each capturing different properties of the HSI manifold: (i) the choice of spatial and spectral filter parameters determines the type and scale of features that define similarity between samples; (ii) the size of local neighborhoods, relative to the sample distribution density around each sample, defines the coarseness and connectivity of the manifold encoding in the embedded feature space; and (iii) the dimensionality of the parametrized manifold representation affects the type of manifold features that are used for classification.

We define a relevant search space for the set of these algorithmic parameters and obtain an ensemble of structural embeddings and low-dimensional manifold representations of the HSI data. For all HSI samples, we find the label of their nearest reference sample in each representation instance. This set of proximity labels is then used to obtain the classification results, together with a clutter map estimate.

4.1 Classification entropy and clutter estimation

Hyperspectral image scene classification methods typically assign each pixel in the imaged scene to one of the classes for which labeled reference samples in the scene (also known as ground truth) are available. Oftentimes, however, a large portion of the HSI may be comprised of pixels that belong to none of the labeled classes; these pixels constitute


with respect to the specified label-set. Clutter pixels are likely diverse in terms of their spectral features, and cannot generally be considered to correspond to a single, new class. A related but somewhat different approach is taken in the context of anomaly detection. There, identification of the “clutter” (anomalous) region typically depends on the collection and utilization of statistical properties of relevant scenes, obtained from a large set of learning examples 

[7]. Here, we do not require additional data beyond those in a single HSI data cube, and restrict the reference/learning samples, used for classification, to a sparse subset of available data samples.

For classification and clutter detection, we first obtain a classification entropy score for every non-reference pixel, as follows. Each non-reference pixel is matched to its nearest (in the low-dimensional classification space) reference pixel, for all instances, or trials, that make up our ensemble. Hence, given a total of trials, each pixel is associated with a vector of proximity labels. This vector is converted to a frequency vector of length , where is the number of labeled classes. Let be the count of the -th label, , in proximity-label vector of the -th pixel, and be the corresponding relative frequency. Taking an information-theoretic approach, we define the classification entropy for the -th pixel as


The classification entropy score lies in . At one extreme (), the labeling frequency vector of the -th pixel has only one non-zero element, meaning that all of its proximity labels are the same. At the other extreme (), the frequency vector is constant, meaning that all proximity labels for the pixel are equally frequent among the instances or trials. Empirically, measures the classification ambiguity of the -th pixel. A pixel with a high classification entropy score is most likely a clutter pixel, whereas a pixel with a low score is likely to belong to one of the available classes. The scores for all pixels can be displayed as a grayscale image, providing an classification entropy map for a given experimental ensemble—see section 5.4.

Using a threshold, , we make use of the classification entropy map to split the HSI scene into two complementary parts: clutter regions (), where no label is given to the corresponding pixels, and labeled regions (), where each pixel is matched to the available classes. While a diverse set of methods has been proposed for combining results in multiple classifier systems [12, 5, 29, 30], most rely on the availability of enough training data or knowledge of certain statistical properties of the dataset and/or classifiers, which may not be the case in many practical applications. Here, we assign each pixel to the most frequently returned class for it among the set of results for each classifier instance. This simple rule provides us with a baseline regarding the performance of our methodology; moreover, it does not entail additional assumptions or abundance of labeled data, and we have found it to generally improve upon any single classifier instance throughout our experiments.

Figure 1: Classification and clutter detection results for the Indian Pines scene. (a) and (e) RGB composite [25] and manual classification labeling and mask. (b)–(d) 10% labeled data sampling: masked classification; classification entropy map; classification and clutter removal with . (f)–(h) 5% labeled data sampling: same as (b)–(d) with .

5 Experiments

5.1 Datasets

Two publicly available HSI datasets have been used to appraise the effectiveness of our approach. One is the Indian Pines111https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html scene, recorded by the AVIRIS sensor [11] in Northwestern Indiana, USA. It consists mostly of agricultural plots (alfalfa, corn, oats, soybean, wheat), and forested regions (woods, and different sub-classes of grass), while a few buildings may also be seen. Several classes exhibit significant spectral overlap, as they correspond to the same basic class under different conditions.

The other is the University of Pavia222http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes scene, recorded by the ROSIS sensor [13] in Pavia, Italy. It covers an urban environment, with various solid structures (asphalt, gravel, metal sheets, bitumen, bricks), natural objects (trees, meadows, soil), and shadows. Objects whose compositions differ from the labeled ones are considered as clutter.

Both datasets are available with a manually labeled mask, where each pixel is assigned a class (color) or is discarded as clutter (black). An RGB composite image and the labeled mask for the two datasets are shown in figs. 1(e), 1(a), 0(e) and 0(a). A summary of relevant parameters for the two datasets may be found in table 1, and the corresponding reference label maps are shown in appendix A.

Figure 2: Classification and clutter detection results for the University of Pavia scene. (a) and (e) RGB composite [17] and manual classification labeling and mask. (b)–(d) 5% labeled data sampling: masked classification; classification entropy map; classification and clutter removal with . (f)–(h) 2% labeled data sampling: same as (b)–(d) with .

5.2 Experimental set-up

Prior to any other processing, noisy bands were removed from the Indian Pines dataset; these correspond to water absorption bands [26] and another that were dominated by noise. All bands were kept for the University of Pavia dataset (albeit have already been removed from the data in the public repository).

For all experiments presented here, the algorithmic ensemble parameters were as follows: (i) The spectral bank consisted of the identity (i.e. the original attributes were used), numerical gradient, mean, and standard deviation filters; spectral features were extracted at two scales using the {

whole, odd, even

} spectrum. (ii) A spatial box filter was applied to all features, using a neighborhood, where . (iii) The size of local manifold neighborhoods was . (iv) The dimension of the manifold-representation classification space was .

The resulting ensemble was comprised by a single instance for each element in the Cartesian product of algorithmic parameter sets, for a total of 54 embeddings and low-dimensional representations of the HSI data. Nearest-neighbor searching was bounded using a sliding window. Labels were acquired via pixel-wise nearest-neighbor classification for each instance and non-weighted consensus for the ensemble.

Reference labeled data for classification were sampled uniformly at random, using the reference labeled mask to extract samples at or density per class for the Indian Pines dataset, and at or density per class for the University of Pavia dataset. The label-set size ranged from approximately to just pixel per class, depending on the relative coverage of the HSI scene.

5.3 Rendering schemes

We render experimental results in three ways. First, we follow the conventional scheme, where only pixels that belong to a class in the reference label map are considered—the rest are discarded, regardless of the corresponding classification results. Quantitative results are provided using the standard overall accuracy (OA; percentage of correctly classified pixels) and average accuracy (AA; average of class-wise classification accuracy percentages) metrics.

While the OA and AA metrics allow comparisons with a reference (manual) classification result, they cannot capture other aspects of the classification problem, and provide no information as to the separation of clutter and labeled samples. Hence, in the absence of available reference data for the whole scene, we resort to visual appraisal of the classification and clutter detection results using the other two rendering schemes.

One is a gray-scale rendering of the classification entropy (clutter estimate) map; ideally, it should be dark for labeled regions and bright for clutter. Last, we render the final classification results with our approach, by merging the ensemble consensus labeling with a clutter mask, obtained by thresholding the clutter estimate image. Good results should have the following qualities: each region is classified correctly, region boundaries are respected by the classification map, and clutter is accurately identified.

Dataset Labeled samples Instances [mean std (max)] Ensemble
OA (%) AA (%) OA (%) AA (%)
Indian Pines 5% 85. 79 4. 12 (92. 88) 82. 06 5. 68 (91. 66)   95. 39   94. 85
10% 90. 00 3. 57 (96. 07) 87. 68 4. 87 (95. 45)   97. 34   97. 13
University of Pavia 2% 94. 87 2. 38 (97. 86) 92. 29 3. 68 (97. 13)   98. 84   98. 42
5% 96. 92 1. 76 (98. 99) 95. 41 2. 47 (98. 43)   99. 60   99. 32
Table 2: Classification accuracy for the embedding ensemble and instances.

5.4 Results

A summary of the classification accuracy metrics for both HSI datasets, measured with respect to the corresponding manually labeled mask, is shown in table 2, for the embedding instances as well as the ensemble. We can see that the ensemble outperforms all instances, having a significant margin from the majority of the latter. This is especially true for the Indian Pines dataset, which proves to be more difficult than the University of Pavia one, due to the spectral overlap between different classes and very low spatial resolution, which means that there may be substantial variability among pixels of the same class. For both datasets, very high classification accuracy is attained. Note, however, that these metrics only take a portion of the image into account.

Results for the Indian Pines dataset are displayed in fig. 1. It can be readily seen in figs. 0(f) and 0(b) that classification errors are mostly localized around a couple of difficult regions. Nevertheless, the clutter estimate maps clearly capture the outline structure of the scene—and many of the mis-classified regions are acknowledged as somewhat ambiguous. Looking at the fused classification-clutter images in figs. 0(g) and 0(d), we can already see the efficacy of the proposed methodology: the overall structure of the manual label-mask is recovered nicely, albeit without particularly sharp features. In addition, we are able to recover regions that were not labeled, although they rather clearly extend beyond the manually drawn boundaries: for example, notice the woods area (red) towards the bottom-right corner, highlighted with a superimposed rectangle.

Corresponding results for the University of Pavia dataset are shown in fig. 2. Here, we attain near-perfect classification results when compared to the manual labeling. More importantly, however, we seem to be able to recover a very high-fidelity profile of the whole scene, without any prior assumptions about the distribution of clutter pixels. Indeed, objects belonging to labeled classes are identified inside unlabeled regions, and figs. 1(h) and 1(c) appear to provide a much more accurate view of the scene than even the manually labeled mask. For example, two such regions are highlighted, where a stretch of road and a set of trees are identified in the unlabeled regions, reflecting the view of the composite color image with high fidelity. While it can be seen that fig. 1(d) does perform better than fig. 1(g), it is noteworthy that the vast majority of the scene structure is recovered using reference samples with density.

6 Discussion

We have presented a new approach for HSI classification and clutter detection via employing an algorithmic ensemble of structural feature embeddings, nonlinear dimension reduction with the LLE method, and a classifier to be used in the low-dimensional manifold parameter space. For feature embedding, we have used only a few simple types of feature transform functions to explore and exploit the spatial and spectral coherence structure in the HSI data. These simple steps, following the isometric principles of manifold structures, have rendered remarkable results for the two datasets studied in this paper, while each step may be easily modified or customized to suit a particular application context, if necessary. Presently, the parameters ranges for manifold dimension estimation and the number of neighbors are prescribed. A desirable extension is to have such ranges determined automatically and adaptively for each dataset.

We have given our rationale for utilizing LLE at the core of our approach. The LLE method can be connected to multiple methods for classification, segmentation, or clustering. While various extensions to LLE and alternative, related approaches to manifold derivation exist, we have found LLE to be as good as or superior to them, while offering a particularly simple computational structure. There is still more to be understood regarding behavior of these methods and their connections to one another.

Appendix A Reference labeling for the HSI datasets

The reference classification data (typically used as ground truth) for the Indian Pines333https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html and University of Pavia444http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes scenes are shown in figs. 4 and 3. The unlabeled regions account for of the entire image domain for the former, and for the latter—see table 1.

Figure 3: Available labeling information for the Indian Pines scene. (a) Reference labeling map ( colored classes and black unlabeled regions). (b) Class color legend.
Figure 4: Available labeling information for the University of Pavia scene. (a) Reference labeling map ( colored classes and black unlabeled regions). (b) Class color legend.

Appendix B Experiments without feature embedding

We have presented a comparison of classification results between the embedding instances and ensemble in table 3. In addition to the superior classification accuracy, the ensemble scheme also enables the provision of the clutter map. Here, we provide experimental results that factor out and highlight the effect of feature space embedding prior to employment of the LLE method.

In particular, we carry out a set of parallel experiments to those of 5 without application of the spatial-spectral filters; the ensemble size is consequently reduced to . Results for the two datasets are shown in figs. 6 and 5, respectively; these are analogous to figs. 2 and 1. Table 3 summarizes the attained classification accuracy, same as table 3. Evidently, the experiments with feature embedding yield higher classification accuracy, as well as sharper clutter maps and labeled region boundaries.

We remark also on the improvement extent that may be gained by feature space embedding. From the class legends provided in appendix A, one may expect a significant difference between the two datasets, with regard to inter-class similarities. Indeed, spectral signatures in the India Pines scene are very similar between certain classes (such as different corn fields, soybean areas, or grass patches), whereas classes in the University of Pavia scene feature more distinctive signatures in comparison. This difference between the datasets means that the former presents a greater challenge to conventional discrimination metrics, and thereby benefits more from feature space embedding, which effectively amounts to an adaptive transformation of the distance metric in the original feature space. Such benefits are confirmed by our experimental results.

Figure 5: Classification and clutter detection results for the Indian Pines scene without feature embedding. (a) and (e) RGB composite [25] and manual classification labeling and mask. (b)–(d) 10% labeled data sampling: masked classification; classification entropy map; classification and clutter removal with . (f)–(h) 5% labeled data sampling: same as (b)–(d) with .
Figure 6: Classification and clutter detection results for the University of Pavia scene without feature embedding. (a) and (e) RGB composite [17] and manual classification labeling and mask. (b)–(d) 5% labeled data sampling: masked classification; classification entropy map; classification and clutter removal with . (f)–(h) 2% labeled data sampling: same as (b)–(d) with .
Instances [mean std (max)] Ensemble
OA (%) AA (%) OA (%) AA (%)
5% 76. 30 3. 11 (73. 31) 74. 37 2. 66 (77. 42)   82. 09   78. 67
10% 80. 18 3. 09 (83. 14) 79. 63 2. 66 (83. 00)   85. 83   83. 77
of Pavia
2% 95. 18 1. 55 (96. 86) 94. 44 1. 78 (96. 27)   97. 53   96. 89
5% 96. 82 1. 17 (97. 84) 96. 11 1. 29 (97. 27)   98. 60   98. 13
Table 3: Classification accuracy for the embedding ensemble and instances without feature embedding.


  • [1] C. M. Bachmann, T. L. Ainsworth, and R. A. Fusina. Exploiting manifold geometry in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 43(3):441–454, Mar. 2005.
  • [2] Y. Bengio, J.-F. Paiement, P. Vincent, O. Delalleau, N. Le Roux, and M. Ouimet.

    Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering.

    In Advances in Neural Information Processing Systems, volume 16 of NIPS ’03, pages 177–184, 2003.
  • [3] J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot. Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and Remote Sensing Magazine, 1(2):6–36, June 2013.
  • [4] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):354–379, Apr. 2012.
  • [5] G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson. Multiple classifiers applied to multisource remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 40(10):2291–2299, Jan. 2002.
  • [6] G. Camps-Valls and L. Bruzzone. Kernel-based methods for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 43(6):1351–1362, June 2005.
  • [7] V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 41(3):1–58, July 2009.
  • [8] H. Chang and D.-Y. Yeung. Robust locally linear embedding. Pattern recognition, 39(6):1053–1065, 2006.
  • [9] D. L. Donoho and C. Grimes.

    Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data.

    Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.
  • [10] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton. Advances in spectral-spatial classification of Hyperspectral Images. Proceedings of the IEEE, 101(3):652–675, Mar. 2013.
  • [11] R. O. Green, M. L. Eastwood, C. M. Sarture, T. G. Chrien, M. Aronsson, B. J. Chippendale, J. A. Faust, B. E. Pavri, C. J. Chovit, M. Solis, M. R. Olah, and O. Williams. Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sensing of Environment, 65(3):227–248, Sept. 1998.
  • [12] T. K. Ho, J. J. Hull, and S. N. Srihari. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1):66–75, Jan. 1994.
  • [13] S. Holzwarth, A. Müller, M. Habermeyer, R. Richter, A. Hausold, S. Thiemann, and P. Strobl. HySens - DAIS 7915/ROSIS Imaging Spectrometers at DLR. In Proceedings of the 3rd EARSeL Workshop on Imaging Spectroscopy, pages 3–14, Herrsching, Germany, May 2003.
  • [14] H. Huang, L. Liu, and M. Ngadi. Recent developments in hyperspectral imaging for assessment of food quality and safety. Sensors, 14(4):7248–7276, Apr. 2014.
  • [15] G. F. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory, 14(1):55–63, Jan. 1968.
  • [16] L. O. Jimenez-Rodriguez, E. Arzuaga-Cruz, and M. Velez-Reyes.

    Unsupervised linear feature-extraction methods and their effects in the classification of high-dimensional data.

    IEEE Transactions on Geoscience and Remote Sensing, 45(2):469–483, Feb. 2007.
  • [17] X. Kang, S. Li, and J. A. Benediktsson. Spectral–spatial Hyperspectral Image classification with edge-preserving filtering. IEEE Transactions on Geoscience and Remote Sensing, 52(5):2666–2677, May 2014.
  • [18] D. H. Kim and L. H. Finkel. Hyperspectral image processing using locally linear embedding. In Proceedings of the 1st International IEEE EMBS Conference on Neural Engineering, pages 316–319, Capri Island, Italy, Mar. 2003.
  • [19] S. Kumar, J. Ghosh, and M. M. Crawford. Best-bases feature extraction algorithms for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 39(7):1368–1379, July 2001.
  • [20] J. Li, X. Huang, P. Gamba, J. M. B. Bioucas-Dias, L. Zhang, J. A. Benediktsson, and A. Plaza. Multiple feature learning for Hyperspectral Image classification. IEEE Transactions on Geoscience and Remote Sensing, 53(3):1592–1606, Aug. 2014.
  • [21] A. Mohan, G. Sapiro, and E. Bosch. Spatially coherent nonlinear dimensionality reduction and segmentation of hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 4(2):206–210, Apr. 2007.
  • [22] D. Ni and H. Ma. Classification of Hyperspectral Image based on sparse representation in tangent space. IEEE Geoscience and Remote Sensing Letters, 12(4):786–790, Oct. 2014.
  • [23] J. Pearlman, S. Carman, C. Segal, P. Jarecke, P. Clancy, and W. Browne. Overview of the Hyperion imaging spectrometer for the NASA EO-1 mission. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, volume 7 of IGARSS ’01, pages 3036–3038, Sydney, NSW, Australia, 2001.
  • [24] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, Dec. 2000.
  • [25] J. M. Sotoca and F. Pla. Hyperspectral data selection from mutual information between image bands. In Structural, Syntactic, and Statistical Pattern Recognition, volume 4109 of Lecture Notes in Computer Science, pages 853–861. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
  • [26] S. Tadjudin and D. A. Landgrebe. Covariance estimation for limited training samples. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, IGARSS ’98, pages 2688–2690 vol.5, Seattle, WA, USA, 1998.
  • [27] Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton. Multiple spectral–spatial classification approach for hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, Nov. 2010.
  • [28] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, Dec. 2000.
  • [29] Y.-C. Tzeng. Remote sensing images classification/data fusion using distance weighted multiple classifiers systems. In Proceedings of the 7th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT ’06, pages 56–60, Taipei, Taiwan, 2006.
  • [30] M. Woźniak, M. Graña, and E. Corchado. A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17, Mar. 2014.