Superpixel Contracted Graph-Based Learning for Hyperspectral Image Classification

03/14/2019 ∙ by Philip Sellars, et al. ∙ University of Cambridge 8

A central problem in hyperspectral image classification is obtaining high classification accuracy when using a limited amount of labelled data. In this paper we present a novel graph-based framework, which aims to tackle this problem in the presence of large scale data input. Our approach utilises a novel superpixel method, specifically designed for hyperspectral data, to define meaningful local regions in an image, which with high probability share the same classification label. We then extract spectral and spatial features from these regions and use these to produce a contracted weighted graph-representation, where each node represents a region rather than a pixel. Our graph is then fed into a graph-based semi-supervised classifier which gives the final classification. We show that using superpixels in a graph representation is an effective tool for speeding up graphical classifiers applied to hyperspectral images. We demonstrate through exhaustive quantitative and qualitative results that our proposed method produces accurate classifications when an incredibly small amount of labelled data is used. We show that our approach mitigates the major drawbacks of existing approaches, resulting in our approach outperforming several comparative state-of-the-art techniques.



There are no comments yet.


page 1

page 3

page 4

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

IN modern applications, hyperspectral images (HSI) capture a detailed light distribution, over several hundreds of spectral bands. This detailed spectral and spatial information increases the discriminative ability of HSIs compared to conventional colour images or multi-spectral images. As a result, hyperspectral imaging has been used in a wide range of applications including classification [1, 2, 3], object tracking [4, 5, 6], environmental monitoring [7, 8] and object detection [9, 10, 11].

In recent years, the classification of hyperspectral data has been an active topic of research. Classifying HSIs requires assigning a class label to each pixel within the image. There are several large hurdles to overcome during the classification process: the high dimensionality of the spectral information, the large spatial variability of the data and the limited number of training samples available due to the cost of data labelling. There have been numerous different attempts to deal with these problems when classifying HSIs, in which the majority of solutions rely on supervised learning (SL).

Kernel based classifiers such as support vector machines (SVM)

[1, 12] are commonly used in the field. Whilst initial kernel methods only used spectral features, many later kernel methods included spatial features. An example being the multiple kernel learning (MKL) aproach of Fang et al [13] which used MKL to combine spatial based feature vectors alongside spectral features.

Fig. 1:

Data visualisation using a weighted undirected graph. The different node colours represents the minimum path length between each node and the central node of the graph, which is coloured in green. Graphs are incredibly useful tools for capturing and visualising the detailed information present in data. Furthermore, graphs are particularly useful for visualising high dimensional data such as that present in hyperspectral images.

To deal with the high dimensionality of the data, many different feature extraction (FE) methods have been investigated. These methods aim at finding a lower dimensional subspace in which the separability among samples is maximised. Kang et al used image fusion and recursive filtering to extract meaningful features

[14], Li at al [15] exploited local binary patterns to extract local features and textural information and Fang et al [16]

used local co-variance matrix representation to characterize the correlation between the spectral and spatial information in HSI data.

Motivated by the remarkable success of Deep Learning (DL), different works have used DL for HSI classification. Convolutions neural networks (CNN) are commonly used to extract high level spectral and spatial features

[17] [18]. Makantasis et al [17]

used a CNN to extract spatial and spectral features and passed these into a multi-layer perceptron. In recent work, generative adversarial networks (GAN), which simultaneously train a generator and discriminator, have also been explored for HSI classification


Although SL based classifiers have shown good results on HSI data, their performance is heavily reliant on having a large quality training set which is a costly investment. As an alternative to SL, we could use unsupervised learning (UL), in which the key idea is to rely on learning a set of classes from data that has not been labelled

[20]. Although works such as [21] reported promising results on using UL for HSI classification, the major problem with UL is that the classification task becomes a massively ill-posed problem that needs specific assumptions to mitigate the lack of correspondence between the produced clusters and the known classes.

The aforementioned constraints associated with SL and UL make semi-supervised learning (SSL)[22] a clear alternative for obtaining an improved classification performance. The idea of SSL is to exploit both labelled and unlabelled data in the training process to produce a higher classification accuracy than solely using the labelled data. The advantages of SSL when using HSI data are two-fold: we decrease the need for large amounts of labelled data and we gain further understanding of the relationships present in the data.

In this paper, we introduce for the first time a superpixel contracted graph-based learning framework for semi-supervised HSI classification, that we named Superpixel Graph Learning (SGL). It produces state-of-the-art results, especially when the amount of labelled data is extremely small. Our framework is composed of three main parts. Firstly, we use a novel superpixel algorithm, specially designed for HSIs, to accurately partition our images into adaptive regions termed superpixels. Secondly, we perform feature extraction on each superpixel to extract discriminative features. Finally, we use the superpixels and features to produce a weighted graphical representation of our image which is then classified using a graphical-learning method (LGC [23]). Our main contributions are:

  • We propose a novel computationally tractable framework for HSI classification, in which our novelty largely relies on:

    • A hyperspectral superpixel approach. To the best of our knowledge, this is the first time that a superpixel approach has been designed specifically for HSI data, that is an approach which considers both spatial and spectral information. We proposed a new novel clustering distance, which combines a Euclidean spectral distance with the Log-Euclidean distance of a covariance matrix representation. This allows us to define meaningful local regions to boost the overall classification performance.

    • Superpixel graph classification. We show that combining superpixels with a graphical representation and a purely graphical classifier brings two major advantages: firstly, it vastly decreases the size of the node set which allows for classification in computationally feasible times without the need for matrix approximation methods. Secondly, it allows for the intelligent regularisation of the final classification map by using superpixels as adaptive local regions.

  • We extensively validate our proposed approach by using three benchmarking datasets and provide a range of experimental results.

  • We demonstrate that the combination of our novel hyperspectral superpixel approach embedded in a graphical setting leads to state of the art results for HSI classification.

The remainder of this paper is organised as follows. Section II explores the related work on semi-supervised learning in the context of HSI classification. Section III is devoted to describing the proposed SGL method including superpixel generation, feature extraction and graph-based semi supervised classification. Section IV contains the experimental results for testing upon three real HSIs and a comparison to other state-of-the-art classification methods. Finally, Section V presents the conclusions as well as discussion of further work.

Fig. 2: The proposed framework for the method. A HSI is read in and dimensionally reduced before superpixel segmentation occurs. Features are then extracted from each superpixel and, when combined with the initial labelling, are used to create a superpixel based graph. A graph classifier is used to propagate information across the graph. The final labels are then combined with the superpixel map to give the classification of the HSI.

Ii Related Work

The problem of semi-supervised classification of HSIs has been previously investigated by the remote sensing community. In this section, we review the existing techniques in turn. The literature regarding Semi-Supervised Learning (SSL) algorithms can be roughly categorised into three different categories. These being generative models, low-density separation and graph-based methods.

Several previous methods have utilised graph-based learning, and our paper is closely related to these. Graph-based methods rely upon constructing a graph representation, where the data points are represented by nodes and the similarity between these data points shown by edges and weights (see Fig 1). The first graph-based learning method was proposed by Camps-Valls in [24]. This paper used different spectral and spatial kernels alongside the Nyström extension, as a matrix approximation tool, to classify HSIs in computationally reasonable times. However, the produced accuracy was poor compared to other methods at the time. Gao et al [25] used a bilayer graph-based learning algorithm to improve classification performance. The two layers were composed of a pixel-based graph, similar to [24]

, and a hypergraph built from grouping relations estimated using unsupervised learning. Cui et al

[26] used an extended random walker (ERW) on a superpixel-based graph to optimise a classification map produced from an SVM. Showing that the accuracy of the SVM could be greatly improved by using the information present in the graph.

Another group of semi-supervised methods seek to directly implement the low density separation assumption [22] by moving the decision boundary away from unlabeled points. The first paper published in this area was by Bruzzone et al [27] which used a novel transductive SVM (TSVM) for HSI classification. A TSVM differs from the standard SVM as it seeks to maximise the margin on a combination of labeled and unlabeled data. Building upon these ideas came semi-supervised self-learning algorithms such as the work by Dópido et al [28]

, in which they sought to adapt active learning, in a which a user actively selects unlabeled samples, to a self learning framework in which the computer automatically selects the most informative unlabeled samples for classification purposes. Ratle et al


took a different path and tackled low density separation using a semi-supervised neural network architecture. An embedding regularizer was added to the loss function to inject the unlabeled information and this approach produced higher classification accuracy than TSVMs.

The rise of deep learning methods, has led to an increase in popularity of generative methods for semi-supervised learning. However, these methods are in still in their infancy. One of the most popular approaches by Zhan et al [30] uses a generative adversarial network (GAN) to simultaneous train a discriminator and generator. However, this paper uses a 1D-GAN and can only exploit spectral feature and the produced accuracy suffers as a result. Zhu et al [19] developed a 3D-GAN which used convolutional neural networks for the discriminator and generator. This architecture allows the approach to exploit the spectral-spatial information present in the HSI. Therefore, the produced accuracy was much higher than [30].

Although works based on generative models and low-density separation have shown encouraging results, in this work, we concentrate on producing a graph-based method, the motivation for which is three-fold. Firstly, data can be naturally represented on graphs. Secondly, a graph representation is motivated by its mathematical background and properties including spareness. Thirdly, data can be represented in an uniform space even if the data is highly heterogeneous. We seek to produce a graph-based method that is based on superpixel representations similar to that of [26]. However, unlike [26] we seek to produce a fully graph-based learning method rather than a graph-based optimisation of a non-graph based method.

Iii Proposed Method

This section is devoted to explaining our proposed framework, which we call SGL. It contains three main parts which are shown in Fig 2. Firstly we describe our hyperspectral superpixel algorithm, subsequently we give a description of the feature extraction process and finally we describe the construction and classification of our graph representation.

Problem Statement. In this work, we seek to find an accurate classification prediction for a large amount of unlabelled data given an extremely small amount of labelled data. We consider the following problem definition for the classification task under the SSL paradigm.

Definition 1

Semi-supervised Classification Task. Given a set of points , , and a label set where , then, we seek to find a function , which utilises the unlabelled data , such that allows for a good prediction for .

Iii-a Superpixel Segmentation

Superpixels are perceptually meaningful connected regions which group pixels similar in colour or other features and were initially introduced by Ren and Malik [31]. In subsequent years, many different algorithmic approaches have been proposed (e.g.[32, 33, 34]). For a detailed survey on superpixel algorithms see [35]. Fig 3 shows the application of a superpixel algorithm to a HSI. Superpixel maps such as the one shown in Fig 3 have many desirable properties: they are computationally and representationally efficient, the individual superpixels are perceptually meaningful and as superpixels are the result of an over-segmentation they are very good at conserving image structures.

Why use superpixels as a tool for HSIs? In order to extract spatial features for use in spectral-spatial models, it is important to be able to define good local regions. Whilst setting a fixed size window (e.g. [36]) has shown good results, a fixed size does not allow for the full exploitation of spatial context. Using superpixels as adaptive regions [13] has been shown to produce discriminative information. Cui et al. [26] demonstrated this by using a superpixel based random walker to optimise an SVM probability map to great effect. Furthermore, Cui et al. additionally demonstrated that a superpixel spectrum is more stable and less affected by noise that an individual pixel spectrum. Therefore by using superpixels we become more resistant to noise present in the data.

The most common algorithm used in clustering based superpixel methods is Lloyd’s algorithm [37]

, a modified version of the popular k-means clustering algorithm. In the context of Lloyd’s algorithm, let us first formalise the definition of a superpixel segmentation.

Definition 2

Superpixel Segmentation. Given an image , where , a superpixel over-segmentation is a partition of such that for each we have , where is a metric, is a feature function and is an individual superpixel.

In this work, we build on this definition to propose our algorithmic approach. Denote a HSI as with dimensions representing the width, height and number of bands respectively. Firstly, for computational efficiency we use PCA [38] to reduce the dimensionality and produce a reduced image where . Denoting an individual pixel as , we then seek to partition our reduced HSI into superpixels. This corresponds to splitting into a family of disjoint sets, , , where corresponds to an individual superpixel and is the number of superpixels. Each superpixel is made up of a set of connected pixels, .

Fig. 3: The Salinas HSI segmented using our proposed HMS algorithm. Fig (a) shows a RGB version of the image and Figs (b)-(d) show the image segmented using 280, 569 and 1034 superpixels respectively. Note that due to content sensitive nature of the HMS extension, there are a larger number of smaller superpixels in content dense regions.

Hyperspectral superpixel construction. When constructing our hyperspectral superpixels, we need to ensure that our algorithm extracts effective information from hyperspectral data. Whilst other works, including [13], feed the first three principal components of HSIs into RGB based superpixel algorithms, we seek to design an algorithm specifically built for hyperspectral data to ensure good performance.

As the base for our algorithm, we began with Manifold SLIC (MSLIC) [33]. MSLIC has two features that make it highly useful for our purpose. Firstly it produces content sensitive superpixels by mapping the image to a two dimensional manifold and measuring the area of Voronoi cells on . Secondly, the number of superpixels will change from the initial selection to fit the content structure in the image, thereby lowering the chances of a poor initial choice of greatly reducing the final accuracy.

Our proposed method is an novel extension of MSLIC into hyperspectral data. We name this extension Hyper Manifold SLIC (HMS). HMS involves three major changes over MSLIC.

Iii-A1 High dimensional adaption

We alter the MSLIC algorithm to take image data with any number of bands . This involves changing several steps such as mapping the image to a 2-dimensional manifold rather than the standard .

Iii-A2 Hyperspectral clustering distance

Based on our previous work [39], we design a more effective clustering distance as a combination of the Euclidean spectral distance [34] and Log-Euclidean (LED) distance [40] of a covariance matrix representation [41]. This combination effectively combines the spatial and spectral data present in the image. For each pixel we construct a covariance matrix using the same methodology as Fang et al [16] and use the LED metric to calculate the distances between these matrices. The distance between two pixels , is given by:


From (1), the parameter controls the compactness of superpixels whilst scales the spatial distance and, for a image with pixels, as in the MSLIC algorithm.

Iii-A3 Spectral Merging

In the original MSLIC algorithm when the area of a seed is below a threshold it is randomly merged with a neighbouring seed . However, in our implementation we instead choose the neighbouring seed which satisfies:


where is the average spectral information of the seed and is the set of neighbouring seeds. We choose to merge superpixels, which are most similar in their spectral properties, as this yields a better form of adaptation to the hyperspectral data.

These proposed changes produce accurate superpixels for HSIs. For further details on our approach, refer to Section II of the supplementary material.

Iii-B Feature Extraction

Now we seek to extract meaningful features from the extracted superpixels ready for graph construction. In this paper, we use the same features as we did in our previous work on superpixels [39]. From each superpixel we extract three different features. To extract localised spatial information we apply a mean filter to each superpixel to produce a mean feature vector which is defined as .


Using a weighted combination of the mean feature vectors of a superpixel’s adjacent neighbours, we can obtain a measure of the spatial information between superpixels. Note that adjacency is defined based on 4-connectivity on the image grid. For each superpixel , we define the set which contains the indexes of its adjacent superpixels. From this, we construct the weighted feature vector which reads:


where the weight between adjacent superpixels is defined as:


with as a predefined scalar parameter. Finally, we propose to extract the centroidal location of each superpixel which we calculate as:


Iii-C Graph based Classification

After defining how to get our superpixel set and extracted features, we now turn to explain how we create our weighted graph-representation. However, we first give some background into challenges associated with the computational implementation of graph-based methods and how superpixels can be used to overcome some of these.

As noted by Camps-Valls et al [42], many graphical algorithms rely on calculating and manipulating large kernel matrices formed by the labelled and unlabelled data. As an example, for an image with pixels the associated graph Laplacian is a matrix of size

. If we seek to inverse the graph Laplacian via singular value decomposition then the computational complexity would be

, ruining the scaling that we seek.

Approximation methods do exist to speed up such matrix inversions. One commonly used technique is the Nyström extension [43] and it is regularly used to speed up matrix calculations [42] [44]. However, the Nyström extension has several drawbacks. It is unsuitable for sparse applications as the Nyström extension acts as an approximation for complete matrices.

In this paper, we implement a novel solution to increase the speed and reduce the complexity of graphical classifiers applied to HSIs. Instead of having a graphical representation where each node represents a pixel, we instead use our segmented superpixels as the node set. This greatly reduces the size of our node set as and allows us to perform matrix inversion and other calculations without approximations such as the Nyström extension. Furthermore, a superpixel representation should help to boost the classification accuracy as we are defining strong local regions in our data. Therefore, from these previously discussed features and our superpixel node set, a weighted, undirected graph can be created. The weight between two connected superpixels and is constructed based on two Gaussian kernels and is given as




where balances the influence between the mean and weighted features and determine the width of the Gaussian kernels. Note that weights are limited in value between with implying most similar. The edge set is constructed using -nearest neighbours. Therefore, the edge weights are defined as:


In the training stage of the algorithm, a set of labelled spectral pixels are randomly selected from the original HSI. The initial label of each superpixel is taken as the average initial label of its corresponding set of pixels. If no pixel within a superpixel is initially labelled then the superpixel is initially unlabelled. The labelling information for the superpixels are specified using a matrix , where is the number of classes present and is the number of superpixels. specifies the value of the seed label for node . The weight matrix and the initial labelling are then passed into Local and Global Consistency (LCG) algorithm [23].

LCG is a graph based SSL approach that formalises the smoothness and clustering assumptions of semi-supervised learning by designing a classification function which is smooth upon the graphical structure generated by all the data. The final labelling is specified using a matrix . The cost function associated with the matrix is given by


where is a regularisation parameter. denotes the set of matrices with non-negative entries. The labelling matrix is given by .

The first term in the cost function is the smoothess constraint, which encourages connected nodes to have similar labelling, whilst the second term fits the finally labelling to the initially labelled data. Balance between these constraints is set by the parameter . The above cost function has a closed form solution which reads: , where and . The final labelling of the nodes is then computed as: . The superpixel labels and the superpixel segmentation are used to construct the final pixel classification map.

Fig. 4: Sensitivity analysis of the parameter

, the number of superpixels, for (a) Indian Pines, (b) Pavia University nd (c) Salinas. Each data point is the accuracy average of ten repetitions whilst the error bars reflect one standard deviation. For all three data sets the accuracy increases with increasing values of

. However, once the number of superpixels is high enough to accurately over-segment the image, there are diminishing returns for increasing the number of superpixels as the accuracy flattens out.

Iv Experimental Results

In this section, we detail the experiments conducted to validate the proposed approach.

Iv-a Data Description

We use three benchmark HSI datasets to evaluate our approach, which have the following characteristics.

  • Indian Pines Dataset. The dataset was collected by an airborne visible/infrared imaging spectrometer (AVIRIS) sensor over an agricultural site in Indiana and has 16 classes. The data set consists of pixels, spectral channels, a spectral range of to m and a spatial resolution of m.

  • Salinas. This image was also collected by the AVIRIS sensor over Salinas Valley, California, and contains 16 classes. The data set size is pixels and identical to Indian Pines has spectral channels over to m. The data set is characterised by a high spatial resolution pf m per pixel.

  • University of Pavia. This dataset was acquired by the reflective optics system imaging spectrometer (ROSIS). The image ( pixels) covers the Engineering School at the University of Pavia and has 9 classes. The image contains 115 spectral channels from to m and has a has a spatial resolution of m.

In section III of the supplementary material, we further describe the datasets used and the mathematical background of the evaluation criteria.

Iv-B Evaluation Protocol

For all experiments carried out in this paper, each one is repeated times and the average and standard deviation are provided for each measurement. The number of principal components used were set by demanding that the total explained variance ratio was . To evaluate the performance of each HSI classifier, we use three commonly implemented evaluation criteria Overall Accuracy (OA), Average Accuracy (AA) and the Kappa Coefficient.

To validate the performance of our proposed classification framework SGL, several state-of-the-art HSI classification methods have been implemented to act as comparisons. These are local co-variance matrix representation (LCMR) [16], superpixel-based classification via multiple kernels (SC-MK) [13], the edge preserving filter based method (EPF) [45], local binary patterns (LBP) [15], an SVM method [1] and image fusion and recursive filtering (IFRF) [14].

Fig. 5: Comparison of the classification accuracy of different methods with varying number of training samples. The methods used are LCMR [16], SC-MK [13], EPF [45], LBP [15], IFRF [14], SVM [1] and the proposed SGL method. The solid lines represent the average of the different methods whilst the shaded area covers one standard deviation from the mean.
Fixed Parameters
Parameter Description Value
Controls the compactness of superpixels 10.0
Weighted filtering kernel 15.0
Kernel parameter for constructing 0.20
-NN construction 8
Weighting in the LGC classifier {0.1,0.15}
Data-based parameters
Parameter Indian Pines Salinas Pavia Univerisity
0.9 0.9 0.1
{0.4,0.5} {} {17,20}
TABLE I: The parameter values used for all experiments in this paper. Note that

signifies a random uniform distribution between

and .

Iv-C Parameter Selection

In our proposed framework, there are eight hyperparameters that come from the four tasks of our framework.

  • Superpixel construction: and .

  • Feature Extraction: .

  • Graph construction: , , and .

  • LGC classification: .

Samples per Class OURS LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
3 98.0 0.8% 83.7 4.0% 82.8 2.9% 78.2 4.8% 78.5 3.4% 87.7 5.3% 76.6 2.3%
5 99.1 0.5% 89.9 2.1% 85.0 3.0% 80.9 4.4% 84.8 2.8% 93.1 1.8% 79.6 1.9%
7 99.1 0.3% 92.3 1.6% 88.2 2.2% 84.9 3.2% 88.4 2.0% 93.8 1.5% 81.3 1.9%
10 99.0 0.4% 93.8 1.1% 90.1 2.2% 86.4 4.3% 91.4 1.2% 95.4 1.3% 82.4 1.2%
15 99.1 0.3% 94.7 1.0% 93.1 1.1% 89.2 2.4% 93.0 1.2% 97.1 1.2% 84.3 1.4%
20 99.3 0.2% 96.1 0.8% 93.3 1.1% 89.8 3.3% 95.1 1.1% 97.3 1.0% 84.5 1.5%
Indiana Pines
Samples per Class OURS LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
3 78.7 5.3% 66.1 3.9% 59.8 4.1% 44.6 5.0% 58.9 3.7% 57.4 3.9% 37.7 5.0%
5 82.6 3.9% 74.1 3.3% 67.8 3.8% 49.7 9.4% 67.3 3.9% 67.2 6.3% 42.4 5.3%
7 87.8 2.1% 78.5 3.0% 73.6 5.1% 57.6 5.4% 75.6 2.9% 75.7 3.8% 48.1 2.2%
10 90.7 2.2% 82.7 3.1% 80.7 2.5% 67.3 3.2% 78.9 2.7% 80.3 1.8% 53.0 3.3%
15 92.9 0.9% 86.9 2.0% 86.2 2.2% 74.5 3.6% 85.9 1.8% 87.9 1.2% 59.5 1.6%
20 94.4 1.4% 90.0 2.0% 89.7 1.6% 80.8 2.3% 88.6 1.4% 89.9 1.9% 63.3 1.4%
University of Pavia
Samples per Class OURS LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
3 84.4 4.9% 70.3 7.3% 63.6 6.4% 56.1 7.1% 55.1 6.4% 57.6 5.8% 57.1 8.3%
5 88.1 4.6% 78.8 5.1% 71.4 4.5% 64.0 7.4% 65.4 4.2% 67.4 4.3% 62.4 4.4%
7 92.1 1.9% 83.0 4.9% 77.9 3.9% 67.0 7.6% 71.1 3.6% 71.7 4.9% 62.2 7.0%
10 93.7 1.4% 87.4 3.5% 81.6 4.7% 72.7 9.1% 75.4 3.1% 77.3 5.8% 67.4 4.7%
15 94.5 1.8% 90.1 2.6% 87.3 2.4% 79.2 6.6% 79.2 2.0% 83.1 3.5% 73.0 3.8%
20 95.4 0.9% 92.3 2.1% 88.3 2.1% 85.7 3.4% 83.4 1.9% 88.5 2.1% 74.1 4.0%
TABLE II: OA (%) of Ten Repeated Experiments with Differing Numbers of training samples per class

For the superpixels construction step, we set the ratio of the number of pixels to the number of superpixels must be at least . The parameters , , , and have the same value for all datasets used. These values were found using empirical testing in a coarse to fine search method. The other three parameters, , and , change value depending on the HSI used. The parameter values used in the experiments are given in Table I.

We leave a discussion of the parameters and to section III of the supplementary material and focus here on the superpixel number . Given that we are using a superpixel based classifier, it is critically important to understand how the superpixel number effects the accuracy. This is especially true when it is unclear what value of to pick for a given image. To investigate the effect of changing the parameter , we classified all three HSIs using a varying number of superpixels and randomly selected samples from each class and plotted the classification accuracy against the superpixel number. The results for this analysis are given in Fig. 4. In general the classification accuracy increases with the number of superpixels, due to the underlying over-segmentation being more accurate. However, once the image is accurately over-segmented, there are diminishing returns for further increasing the superpixel number. Combined with the fact that increasing the number of superpixels increases the size of the graph and thus the running time, we used the smallest number superpixels that reliably gave a good classification accuracy for each HSI.

For the compared methods the parameters were set using the default values provided in the demo code or referenced in the papers themselves. The SVM method was implemented using the LIBSVM [46] library and uses a Gaussian kernel and five-fold cross validation.

Iv-D Experimental Results

Our experiments are organised into two parts. Firstly, we compare the classification accuracy of our proposed framework with the comparison classifiers detailed above. Due to the semi-supervised nature of our method, we will be testing the classification performance using very limited amounts of training data. Secondly, we will seek to use visual classification maps to understand and explain the performance of our classifier to relation to the other classifiers.

Technique OURS LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
OA 99.24 0.16% 93.90 1.29% 90.38 2.42% 86.53 1.99% 90.68 1.35% 95.87 1.62% 82.42 1.15%
AA 98.90 1.51% 96.26 1.02% 94.16 1.11% 93.51 0.91% 93.00 1.03% 96.24 1.43% 88.55 0.99%
Kappa 99.15 0.17% 93.22 1.44% 89.33 2.67% 85.10 2.15% 90.68 1.36% 95.41 1.80% 80.53 1.25%
Indiana Pines
Technique OURS LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
OA 90.89 2.98% 82.74 2.32% 79.91 2.60% 68.95 2.01% 80.52 2.03% 80.86 3.76% 51.20 3.92%
AA 92.16 6.77% 90.48 1.56% 87.86 1.53% 71.39 3.49% 88.46 1.29% 74.99 3.16% 51.19 3.22%
Kappa 87.50 3.33% 80.51 2.59% 77.31 2.93% 65.02 2.25% 78.09 2.23% 78.45 4.16% 45.41 4.11%
University of Pavia
Technique OURS LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
OA 93.70 1.35% 88.29 4.01% 80.23 4.06% 73.92 7.06% 72.66 4.29% 76.36 3.81% 67.40 4.66%
AA 93.25 5.03% 90.72 1.67% 83.99 2.15% 76.10 5.06% 75.99 2.71% 70.39 3.24% 70.08 2.48%
Kappa 91.71 1.73% 84.91 4.89% 74.63 4.60% 67.40 8.23% 72.66 4.25% 69.70 4.55% 59.38 4.88%
TABLE III: OA (%) AA (%) and Kappa (%) of ten consecutive experiments with ten training samples per class
(a) (b) (c) (d) (e)
(f) (g) (h) (i)
Fig. 6: Salinas data set. (a) Colour composite image. (b) Ground truth. (c)-(h) are classifications maps produced using labelled samples for each class. The methods used were: (c) the proposed SGL , (d) LCMR [16], (e) SVM [1] , (f) SC-MK[13], (g) EPF [45], (h) LBP [15] and (i) IFRF [14]

(E1) In our first experiment, we evaluate the overall accuracy (OA) of our method against the state-of-the-art when using a reduced amount of labelled data for training ( randomly selected samples per class). The accuracy of the different classifiers for the three benchmark datasets are given in Table II and the graphical representation of the results is shown in Fig. 5.

We see that the accuracy produced by the SGL framework is, by a significant margin, the best of any classifier considered in this paper. The SGL framework produces the best accuracy for all three benchmark images for each differing amount of labelled data. In particular, the average difference in OA between SGL and its nearest competitor LCMR [16], across the three datasets, was when using samples per class and was when using samples per class. Highlighting the semi-supervised nature of the SGL framework that allows it to exploit information present in the unlabelled data to overcome the limited amount of labelled samples.

(a) (b) (c) (d) (e)
(f) (g) (h) (i)
Fig. 7: Pavia University data set. (a) Colour composite image. (b) Ground truth. (c)-(h) are classifications maps produced using labelled samples for each class. The methods used were: (c) the proposed SGL , (d) LCMR [16], (e) SVM [1] , (f) SC-MK[13], (g) EPF [45], (h) LBP [15] and (i) IFRF [14]
(a) (b) (c) (d) (e)
(f) (g) (h) (i)
Fig. 8: Indian pines data set. (a) Colour composite image. (b) Ground truth. (c)-(h) are classifications maps produced using labelled samples for each class. The methods used were: (c) the proposed SGL , (d) LCMR [16], (e) SVM [1] , (f) SC-MK[13], (g) EPF [45], (h) LBP [15] and (i) IFRF [14]

(E2) To gain an understanding about how each classifier was performing and the explanation for the large increase in classification accuracy obtained by SGL, we produce visual classification maps. For each HSI we use ten labelled samples per class and calculated the overall accuracy (OA), average accuracy (AA), the Kappa coefficient and the full classification map. The results for this experiment are reported in Table III. Furthermore, Fig. 6-8 give a colour composite image, ground truth image and the final classification maps for the seven considered methods. In section III of the supplementary material we provide a class by class accuracy breakdown.

Examining the OA, AA and Kappa coefficient of the differing methods, we observe that SGL is again the best performing method with an average improvement of OA , AA and Kappa in the Indian pines scene, OA , AA and Kappa in the Pavia University scene and OA , AA and Kappa in the Salinas scene compared to the other classifiers (excluding the SVM).

To provide an explanation for the fantastic performance of SGL compared to the other methods let us examine the classification maps. The poorest performing classifier was the SVM. The SVM method only uses spectral information and as a result produces very noisy classification maps. The EPF method seeks to optimise the SVM classification map with an edge preserving filter to smooth out some of this noise and from these results we can see it successfully does so. However, the poor performance of the underlying SVM classification prevents the EPF method from achieving good classification. The LBP and IFRF methods produce over-smooth classification results when only a limited amount of data is available. This causes poor performance in the more complicated Indian Pines and Pavia University images. The LCMR and SCMK methods are the closest competitors to the SGL method with LCMR slightly outperforming the SCMK method due to a slightly higher amount of smoothing. Both of these methods manage to preserve edges and boundaries whilst producing smooth classification maps. This is due to the inclusion of spatial information via local neighbouring pixel construction and superpixel based kernels respectively.

What sets SGL apart from the other methods considered is that the classification map has been intelligently smoothed with near complete preservation of edges and boundaries. Primarily, this is due to the use of superpixels as the node set in our graph. The superpixels produce by our novel superpixel algorithm have accurately preserved the edges and boundaries in the image. Therefore, when we assign labels to each superpixel, rather than each pixel, we smooth our classification map across the homogeneous superpixels whilst retaining boundaries.

V Conclusion

In this paper, we have developed a novel semi-semi-supervised graph-based approach, SGL, for the classification of hyperspectral images. The proposed method can be split into three main stages: over-segmentation of HSIs with a novel superpixel algorithm specially designed for dealing with hyperspectral data, feature extraction to extract discriminative features and graph construction and classification. Our experiments with real benchmark HSIs demonstrate that our proposed method greatly outperforms other state-of-the-art classifiers in terms of qualitative and quantitative results, especially when using an incredibly small amount of data.

The semi-supervised nature of our solution exploits data present in the unlabelled data and can overcome the issue of having a highly limited training set, a common problem in the field of remote sensing. Furthermore, for the first time we propose using superpixels as the nodes of a pure graphical classifier which has two large benefits. Firstly, the size of the superpixel graph is much smaller than a pixel based graph allowing for computational reasonable run times without the need for matrix approximations. Secondly, applying labels to superpixels intelligent smooths our classification maps with near perfect preservation of edges and boundaries.

In our future work , we intend on applying deep learning to automate the extraction of deep features. Furthermore, we seek to apply recent work on heterogeneous graphs to investigate a combined superpixel/pixel representation.

Vi Acknowledgment

The authors would like to thank Prof. D. Landgrebe from Purdue University and the NASA Jet Propulsion Laboratory for providing the hyperspectral data sets. We would also like to thank Prof David Coomes from the Department of Plant Sciences, University of Cambridge, for his advice and support. This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) and the National Physical Laboratory (NPL). Support from the Centre for Mathematical Imaging in Healthcare (CMIH) University of Cambridge and the Maths in Healthcare Centre is greatly acknowledged. CBS acknowledges support from the Leverhulme Trust project on Breaking the non-convexity barrier, the Philip Leverhulme Prize, the EPSRC grant Nr. EP/M00483X/1, the EPSRC Centre Nr. EP/N014588/1, the European Union Horizon 2020 research and innovation programmes under the Marie Skodowska-Curie grant agreement No 777826 NoMADS and No 691070 CHiPS, the Cantab Capital Institute for the Mathematics of Information and the Alan Turing Institute. We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Quadro P6000 GPU used for this research.


  • [1] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 8, pp. 1778–1790, 2004.
  • [2] L. Fang, S. Li, X. Kang, and J. A. Benediktsson, “Spectral-spatial classification of hyperspectral images with a superpixel-based discriminative sparse model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 8, pp. 4186–4201, 2015.
  • [3] L. Fang, N. He, S. Li, A. J. Plaza, and J. Plaza, “A new spatial–spectral feature extraction method for hyperspectral images using local covariance matrix representation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 6, pp. 3534–3546, 2018.
  • [4] T. Wang, Z. Zhu, and E. Blasch, “Bio-inspired adaptive hyperspectral imaging for real-time target tracking,” IEEE Sensors Journal, vol. 10, no. 3, pp. 647–654, 2010.
  • [5] B. Uzkent, M. J. Hoffman, and A. Vodacek, “Real-time vehicle tracking in aerial video using hyperspectral features,”

    IEEE Conference on Computer Vision and Pattern Recognition Workshops

    , pp. 36–44, 2016.
  • [6] B. Uzkent, A. Rangnekar, and M. J. Hoffman, “Aerial vehicle tracking by adaptive fusion of hyperspectral likelihood maps,” IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 233–242, 2017.
  • [7] R. J. Ellis and P. W. Scott, “Evaluation of hyperspectral remote sensing as a means of environmental monitoring in the st. austell china clay (kaolin) region, cornwall, uk,” Remote sensing of environment, vol. 93, no. 1-2, pp. 118–130, 2004.
  • [8] S. Manfreda, M. McCabe, P. Miller et al., “On the use of unmanned aerial systems for environmental monitoring,” Remote Sensing, vol. 10, no. 4, p. 641, 2018.
  • [9]

    Z. Pan, G. Healey, M. Prasad, and B. Tromberg, “Face recognition in hyperspectral images,”

    IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 25, no. 12, pp. 1552–1560, 2003.
  • [10]

    Y. Liu, G. Gao, and Y. Gu, “Tensor matched subspace detector for hyperspectral target detection,”

    IEEE Transactions on Geoscience and Remote Sensing (TGRS), vol. 55, no. 4, pp. 1967–1974, 2017.
  • [11] Y. Zhang, B. Du, L. Zhang, and T. Liu, “Joint sparse representation and multitask learning for hyperspectral target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 894–906, 2017.
  • [12] G. Mercier and M. Lennon, “Support vector machines for hyperspectral image classification with spectral-based kernels,” Proceedings of the International IEEE Geoscience and Remote Sensing Symposium, vol. 1, pp. 288–290, 2003.
  • [13] L. Fang, S. Li, W. Duan, J. Ren, and J. A. Benediktsson, “Classification of hyperspectral images by exploiting spectral–spatial information of superpixel via multiple kernels,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 12, pp. 6663–6674, 2015.
  • [14] X. Kang, S. Li, and J. A. Benediktsson, “Feature extraction of hyperspectral images with image fusion and recursive filtering,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 6, pp. 3742–3752, 2014.
  • [15] W. Li, C. Chen, H. Su, and Q. Du, “Local binary patterns and extreme learning machine for hyperspectral imagery classification,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 7, pp. 3681–3693, 2015.
  • [16] L. Fang, N. He, S. Li, and J. Plaza, “A new spatial-spectral feature extraction method for hyperspectral images using local covariance matrix representation,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 6, pp. 3534–3546, 2018.
  • [17] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep supervised learning for hyperspectral data classification through convolutional neural networks,” IEEE Int. Geosci. Remote Sens. Symp. Italy, pp. 4959–4962, 2015.
  • [18] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8, pp. 4544–4554, 2016.
  • [19] L. Zhu, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Generative adversarial networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 9, pp. 5046–5063, 2018.
  • [20] H. B. Barlow, “Unsupervised learning,” Neural Computation, vol. 1, pp. 295–311, 1989.
  • [21] Z. Zhu et al., “Unsupervised classification in hyperspectral imagery with non-local total variation and primal-dual hybrid gradient algorithm,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2786–2798, 2017.
  • [22] O. Chapelle, A. Zien, and B. Schölkopf, Semisupervised learning.   MIT Press, 2006.
  • [23] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, “Learning with local and global consistency,” NIPS, pp. 595–602, 2004.
  • [24] G. Camps-Valls, T. Marsheva, and D. Zhou, “Semi-supervised graphbased hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10, pp. 3044–3054, 2007.
  • [25] Y. Gao, R. M. Ji, P. Cui, Q. Dai, and G. HUa, “Hyperspectral image classification through bilayer graph-based learning,” IEEE Transactions on Image Processing, vol. 23, no. 7, pp. 2769–2778, 2011.
  • [26] B. Cui, X. Xie, M. Xiudan, G. Ren, and Y. Ma, “Superpixel-based extended random walker for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 6, pp. 3233–3243, 2018.
  • [27] L. Bruzzone, M. Chi, and M. Marconcini, “Transductive svms for semisupervised classification of hyperspectral data,” International Geoscience and Remote Sensing Symposium, 2005.
  • [28] I. Dópido, J. Li, P. Marpu, A. Plaza, J. Dias, and J. Benediktsson, “Semisupervised self-learning for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 7, pp. 4032–4044, 2013.
  • [29] F. Ratle, G. Camps-Valls, and J. Weston, “Semisupervised self-learning for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2271–2282, 2013.
  • [30] Y. Zhan, D. Hu, Y. Wang, and X. Yu, “Semisupervised hyperspectral image classification based on generative adversarial networks,” IEEE Trans. Geosci. Remote Sens. Letters, vol. 15, no. 2, pp. 212–216, 2018.
  • [31] X. Ren and J. Malik, “Learning a classification model for segmentation,” International Conference on Computer Vision, pp. 10–17, 2003.
  • [32] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp. 2274–2282, 2012.
  • [33] Y. J. Liu, C. Yu, M. Yu, and Y. He, “Manifold slic: A fast method to compute content-sensitive superpixels,” Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 651–659, 2016.
  • [34] G. Maierhofer, D. Heydecker, A. I. Aviles-Rivero, S. M. Alsaleh, and C. Schönlieb, “Peekaboo-where are the objects? structure adjusting superpixels,” IEEE International Conference on Image Processing (ICIP), 2018.
  • [35] D. Stutz, A. Hermans, and B. Leibe, “Superpixels: an evaluation of the state-of-the-art”,” Computer Vision and Image Understanding,.
  • [36] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classification using dictionary-based sparse representation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 10, pp. 3973–3985, 2011.
  • [37] S. Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.
  • [38] I. Jolliffe, Principal Component Analysis, 2005.
  • [39] P. Sellars, A. Aviles-Rivero, N. Papadakis, D. Coomes, A. Faul, and C.-B. Schönlieb, “Semi-supervised Learning with Graphs: Covariance Based Superpixels For Hyperspectral Image Classification,” arXiv e-prints, p. arXiv:1901.04240, Jan. 2019.
  • [40]

    V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric means in a novel vector space structure on symmetric positive-definite matrice,”

    SIAM J. Matrix Anal. Appl., vol. 29, no. 1, pp. 328–347, 2006.
  • [41] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: A fast descriptor for detection and classification,” Proc. Eur. Conf. Comput. Vis., pp. 589–600, 2006.
  • [42] G. Camps-Valls, B. Marsheva, and D. Zhou, “Semi-supervised graphbased hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10, pp. 3044–3054, 2007.
  • [43] C. K. I. Williams and M. Seeger, “Using the nyström method to speed up kernel machines,” Proc. Neural Information Processing Systems, 2001.
  • [44] A. Bertozzi and A. Flenner, “Diffuse interface models on graphs for classification of high dimensional data,” Multiscale Modeling and Simulation, vol. 10, no. 3, pp. 1090–1118, 2012.
  • [45] X. Kang, S. Li, and J. A. Benediktsson, “Spectral-spatial hyperspectral image classification with edge-preserving filtering,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 2666–2677, 2014.
  • [46] C. C. Chang and C. J. Lin, “Libsvm: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27:1–27:27, 2011.

I Outline

The purpose of this supplementary material is to provide further details of the methodology used in the main paper as well as provide additional experimental results to validate the performance of our proposed framework. The supplementary material is divided into two sections:

  • Section II. In this section we give more detail into the hyperspectral extension of MSLIC [33], named Hyper-Manifold SLIC (HSM), which allows it to accurately over-segment hyperspectral images.

  • Section III. In this section we describe the benchmarking datasets in additional detail, further expand the parameter analysis, and provide additional classification results of the SGL method.

Ii Superpixel Construction

As our starting point, we used the Manifold SLIC method [33] algorithm developed by Liu et al and we would refer readers to their paper for a detailed explanation of the manifold extension. In the main paper, we list the major changes we made to the MSLIC algorithm to produce our hyperspectral extension which we call Hyperspectral Manifold SLIC (HMS). In this section, we list some other additional changes that we made to MSLIC that were not stated in the main paper. We also include additional visual examples of the application of HMS to real HSIs.

Ii-a Parameter Changes

From the original paper we made a small number of parameter changes. In this section, we give the reasoning behind these changes.

Convergence Conditions. In our implementation we stop the iteration loop when either the residual energy increases or when the percentage decrease in the residual energy is less than . This is to prevent the superpixel algorithm for running for extended periods of time.

Enforced Connectivity. The final superpixels generated by the algorithm must be 4-connected on the image grid. This is ensured by an enforced connectivity algorithm which has two thresholds for the minimum and maximum superpixel size. In our algorithm, we use a minimum superpixel size of 8 and a maximum superpixel size of , where is the number of pixels and is the initial number of superpixels. The reason for choosing such a small minimum cluster size was that certain classes in the Indian Pines dataset were tiny in size and we needed to be able to capture these small areas.

Ii-B Supplementary Visual Results for HSM

In Figs 9-11 we provide more examples of the application of HSM to the three different HSIs considered in the main paper. We provide over-segmentations with differing numbers of superpixels highlighting the content sensitivity of the algorithm. In particular, note that Indian Pines and Salinas are easily over-segmented using a small number of superpixels whilst the more complex structure of Pavia University requires more superpixels to achieve an accurate over-segmentation.

Iii Supplementary Results

In this section, we expand the details regarding the experimental methodology used. Additionally, we provide further experimental results that validate the performance of our proposed framework SGL.

Iii-a Further Description of the Data Sets

The three labelled datasets used in the main paper are ”AVIRIS Indian Pines”, ”AVIRIS Salinas” and ”Reflective Optics System Imaging Spectrometer (ROSIS-03) University of Pavia.” Whilst the main paper describes the format of the three datasets, In this section, we give further details on the data sets and prepossessing. A class by class breakdown of each data set listing the different classes and the number of samples is given in Table IV.

Indian Pines consists of mainly agricultural classes with a small amount of non-organic land cover. Due to presence The different classes vary greatly in size with the smallest classes in the tens of pixels whilst the largest classes have several thousand pixels. Bands [104-108], [150-163] and 220 were removed prior to classification due to water absorption effects.

Salinas is made up entirely of 16 different vegetation classes. The classes are larger with the smallest class comprising several hundred pixels. The scene has two large classes: ”grapes untrained” and ”vineyard untrained” which dominate a large area of land cover. We remove bands [108-112], [154-167] and 224 due to water absorption effects.

University of Pavia is different from the other benchmarks in that it is contains a significant amount of non-organic land cover such as asphalt and bricks. This scene contains a small number of classes and a more complex geometry which should make it harder for a superpixel based classifier to classify.

Iii-B Description of the performance metrics

In the main paper, we use three commonly used evaluation criteria to evaluate the performance of each classifier. In this section, we give the explicit description of each of these criteria.

Overall Accuracy (OA). This measure is the ratio of the number of correctly classified pixels divided by the total number of pixels.

Average Accuracy (AA). This measure gives the average classification accuracy of all classes in an image.

Kappa Coefficient. This metric gives the agreement between the final classification and the ground-truth. It gives the percentage agreement corrected by the chance that this agreement is due to chance alone and is thought to be more robust that simple percentage agreement.

Iii-C Parameter Analysis

In the main paper, we explained how the classification accuracy changed with the parameter . In this section we explain how the parameters and change value depending on the HSI used.

is the deviation of the location based Gaussian kernel . Consider increasing the image size whilst keeping

constant. The width of the Gaussian distribution would become narrower and narrower with respect to the size of the image. This would reduce the weight of the edges connecting superpixels that are further apart compared to superpixel pairs which are close. Eventually the decreasing width of the kernel would lead to the removal of all non-local connections in the graph preventing information from properly propagating across the graph. Therefore, to balance the width of the spatial kernel

to the spectral kernel , which does not change with image size, the value of should increase.

weights between the two different spatial-spectral features and , with a lower value favouring the mean filter whilst a higher value emphasises the weighted filter. It was found that mean filtering was more effective for classifying Pavia University whilst weighted filtering was more effective when classifying Indian Pines and Salinas. An initial explanation for this is that the more complex land cover structure of the Pavia University scene means that the spatial information between superpixels is less informative than the spatial information within a superpixel.

Iii-D Further Experimental Results

(E3) As an additional experiment, we classified each HSI using the same parameters as the main paper with labelled samples per class. From this, we produced a class by class accuracy breakdown. The results for this experiment are contained in Tables V. From these tables, we see that the SGL method produced the highest classification accuracy for the majority of individual classes in each HSI with particular dominance in the Salinas and Indian Pines images. For the Indian Pines scene SGL produced clear accuracy improvements for classes 3 (Corn-mintill) and 10 (Soybean-mintill) in particular. Similarly, in the Salinas scene, SGL produced large improvements in the classification of classes 8 (Grapes untrained) and 15 (Vinyard untrained) as could be seen from the visual classificaiton maps in the main paper.

(a) (b) (c) (d) (e)
Fig. 9: Superpixel over-segmentations on the Indian Pines scene generated by the HMS extension. From left to right: (a) the composite RGB image, (b)-(e) superpixel segmentations with , , and superpixels respectively.
(a) (b) (c) (d) (e)
Fig. 10: Superpixel over-segmentations on the University of Pavia scene generated by the HMS extension. From left to right: (a) the composite RGB image, (b)-(e) superpixel segmentations with , , and superpixels respectively.
(a) (b) (c) (d) (e)
Fig. 11: Superpixel over-segmentations on the Salinas scene generated by the HMS extension. From left to right: (a) the composite RGB image, (b)-(e) superpixel segmentations with , , and superpixels respectively.
Indiana Pines
Class SGL LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
1 98.04 0.65% 99.44 1.17% 98.33 1.43% 53.78 28.30% 100.0 0% 53.51 25.90% 20.53 5.10%
2 76.06 9.19% 75.99 10.18% 78.82 6.80% 55.88 11.32% 70.94 7.05% 70.35 10.65% 43.14 8.51%
3 84.27 6.04% 70.78 8.08% 77.65 9.18% 60.19 17.99% 70.28 12.13% 67.99 7.80% 39.15 8.39%
4 96.67 2.73% 93.88 10,78% 86.39 12.62% 34.57 12.83% 97.93 3.23% 79.37 12.24% 21.25 3.97%
5 92.30 7.48% 90.32 10.14% 82.33 10.56% 93.39 5.30% 82.92 7.94% 79.40 13.64% 59.34 11.59%
6 98.71 0.60% 91.40 4.18% 89.54 7.43% 86.32 10.34% 90.36 5.49% 93.63 4.96% 83.29 3.83%
7 100.0 0.00% 100.0 0.00% 100.0 0.00% 70.92 39.13% 100.0 0.00% 39.65 23.77% 24.85 11.06%
8 100.0 0.00% 99.68 0.23% 97.09 9.19% 98.41 3.83% 100.0 0.00% 99.97 0.07% 93.12 4.02%
9 100.0 0.00% 100.0 0.00% 100.0 0.00% 59.44 26.70% 100.0 0.00% 28.81 21.34% 12.59 8.36%
10 88.94 6.52% 76.46 7.31% 71.32 10.49% 61.19 11.43% 79.90 5.05% 75.95 9.75% 36.93 10.25%
11 91.04 7.49% 71.40 6.33% 69.22 12.69% 81.05 8.72% 73.78 6.78% 93.81 4.23% 61.50 3.96%
12 90.05 4.14% 90.50 3.66% 78.47 17.31% 44.31 14.00% 70.58 6.99% 74.08 10.25% 28.20 5.78%
13 99.56 0.15% 99.33 0.25% 99.90 0.22% 98.34 3.38% 98.31 2.97% 75.32 15.02% 80.12 6.41%
14 100.0 0.00% 98.18 3.54% 88.14 2.69% 95.15 4.04% 91.32 5.03% 98.31 1.34% 88.22 4.03%
15 97.69 6.92% 91.62 10.39% 90.96 11.42% 64.91 23.49% 90.59 10.07% 77.14 11.42% 39.10 8.79%
16 100.0 0.00% 98.67 3.79% 97.59 1.50% 84.46 7.54% 98.43 1.14% 92.62 14.18% 87.71 20.71%
OA 90.89 2.98% 82.74 2.32% 79.91 2.60% 68.95 2.01% 80.52 2.03% 80.86 3.76% 51.20 3.92%
AA 92.16 6.77% 90.48 1.56% 87.86 1.53% 71.39 3.49% 88.46 1.29% 74.99 3.16% 51.19 3.22%
Kappa 87.5 3.33% 80.51 2.59% 77.31 2.93% 65.02 2.25% 78.09 2.23% 78.45 4.16% 45.41 4.11%
Univeristy of Pavia
Class SGL LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
1 86.64 4.39% 79.29 7.09% 72.48 13.89% 94.80 4.52% 59.64 5.07% 68.30 7.67% 94.09 5.44%
2 95.87 3.17% 87.67 8.05% 80.05 8.03% 89.55 6.61% 69.72 8.12% 94.90 2.19% 85.59 2.55%
3 85.37 10.53% 90.96 4.45% 76.84 9.10% 62.03 23.65% 79.52 7.30% 53.78 10.49% 42.74 13.46%
4 87.44 3.77% 95.10 3.40% 94.77 2.74% 57.08 11.72% 66.44 7.33% 66.44 22.53% 59.85 10.48%
5 95.84 2.91% 97.03 6.17% 99.66 0.08% 91.20 5.64% 89.91 12.78% 99.63 1.10% 93.69 5.78%
6 99.92 0.19% 95.37 2.36% 76.24 6.62% 49.32 13.91% 89.33 4.03% 82.47 9.46% 39.38 9.21%
7 96.59 1.10% 92.58 8.13% 76.06 14.92% 66.86 13.46% 89.15 8.91% 63.32 12.76% 42.26 10.38%
8 94.03 5.63% 84.67 5.57% 79.79 3.85% 75.57 10.98% 80.78 16.55% 55.33 7.28% 73.22 5.74%
9 97.55 0.43% 93.80 3.55% 100.0 0.00% 98.48 1.70% 59.40 7.01% 49.33 9.07% 99.87 0.10%
OA 93.70 1.35% 88.29 4.06% 80.23 4.06% 73.92 7.06% 72.66 4.29% 76.36 3.81% 67.40 4.66%
AA 93.25 5.03% 90.72 1.67% 83.99 2.15% 76.10 5.06% 75.99 2.71% 70.39 3.24% 70.08 2.48%
Kappa 91.71 1.73% 84.91 4.89% 74.63 4.60% 67.40 8.23% 72.66 4.25% 69.70 4.55% 59.38 4.88%
Class SGL LCMR [16] SC-MK [13] EPF [45] LBP [15] IFRF [14] SVM [1]
1 100.0 0.00% 99.95 0.06% 99.93 0.13% 100.0 0.00% 97.97 2.64% 95.77 6.65% 97.54 2.53%
2 100.0 0.00% 93.21 5.19% 98.66 1.82% 99.87 0.29% 96.57 2.58% 100.0 0.00% 99.10 0.49%
3 100.0 0.00% 99.56 0.42% 96.94 4.04% 93.84 2.01% 98.59 2.03% 99.32 0.78% 86.62 3.31%
4 99.71 0.01% 100.0 0.00% 98.79 0.77% 97.70 0.79% 97.84 2.98% 87.42 8.54% 96.92 0.73%
5 98.09 0.00% 96.88 1.09% 95.63 1.92% 99.48 0.98% 92.37 4.34% 99.92 0.08% 97.57 2.25%
6 99.93 0.02% 98.53 0.67% 99.53 0.81% 99.98 0.02% 92.14 4.40% 100.0 0.00% 99.97 0.05%
7 99.48 0.98% 97.57 1.96% 94.22 5.79% 97.92 2.40% 92.68 6.81% 98.88 1.14% 97.68 1.84%
8 99.38 0.62% 87.84 4.77% 74.47 11.72% 84.19 7.87% 85.27 6.12% 96.83 4.43% 70.82 3.92%
9 100.0 0.00% 96.90 2.63% 99.40 0.81% 99.47 0.19% 93.05 2.72% 98.82 0.18% 98.84 0.90%
10 96.72 2.92% 93.71 7.73% 88.34 7.20% 86.13 5.46% 93.65 3.28% 99.21 8.00% 79.64 4.15%
11 95.882 2.19% 99.94 0.05% 97.03 3.65% 91.81 8.68% 97.83 3.26% 98.96 0.45% 83.29 6.77%
12 99.90 0.00% 99.60 1.14% 97.50 6.22% 99.42 0.56% 89.96 4.07% 98.19 1.15% 94.42 1.60%
13 98.80 0.00% 98.65 0.73% 95.36 4.40% 96.32 2.87% 91.59 6.07% 92.20 8.00% 88.15 8.68%
14 95.38 1.48% 95.23 2.86% 90.29 6.73 95.27 11.66% 88.18 6.84% 87.05 14.28% 84.51 17.07%
15 99.28 0.07% 88.12 7.50% 84.36 6.15% 56.02 5.52% 82.42 12.45% 86.51 8.12% 49.19 2.71%
16 100.0 0.00% 94.46 6.47% 96.17 3.18% 98.67 4.09% 97.96 3.91% 99.77 0.56% 92.51 8.59%
OA 99.24 0.16% 93.90 1.29% 90.38 2.42% 86.53 1.99% 90.68 1.35% 95.87 1.62% 82.42 1.15%
AA 98.9 1.51% 96.26 1.02% 94.16 1.11% 93.51 0.91% 93.00 1.03% 96.24 1.43% 88.55 0.99%
Kappa 99.15 0.17% 93.22 1.44% 89.33 2.663% 85.10 2.15% 90.68 1.36% 95.41 1.80% 80.53 1.25%
TABLE V: OA(%) AA(%), Kappa and a class by class breakdown obtained by different classifiers with ten training samples per class. The best results are highlighted in green.