Graph Convolutional Subspace Clustering: A Robust Subspace Clustering Framework for Hyperspectral Image

04/22/2020 ∙ by Yaoming Cai, et al. ∙ 0

Hyperspectral image (HSI) clustering is a challenging task due to the high complexity of HSI data. Subspace clustering has been proven to be powerful for exploiting the intrinsic relationship between data points. Despite the impressive performance in the HSI clustering, traditional subspace clustering methods often ignore the inherent structural information among data. In this paper, we revisit the subspace clustering with graph convolution and present a novel subspace clustering framework called Graph Convolutional Subspace Clustering (GCSC) for robust HSI clustering. Specifically, the framework recasts the self-expressiveness property of the data into the non-Euclidean domain, which results in a more robust graph embedding dictionary. We show that traditional subspace clustering models are the special forms of our framework with the Euclidean data. Basing on the framework, we further propose two novel subspace clustering models by using the Frobenius norm, namely Efficient GCSC (EGCSC) and Efficient Kernel GCSC (EKGCSC). Both models have a globally optimal closed-form solution, which makes them easier to implement, train, and apply in practice. Extensive experiments on three popular HSI datasets demonstrate that EGCSC and EKGCSC can achieve state-of-the-art clustering performance and dramatically outperforms many existing methods with significant margins.



There are no comments yet.


page 1

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Hyperspectral images (HSIs) acquired by remote sensors contain rich spectral and spatial information, which enables us to accurately recognize the region of interest. Over the past decade, HSIs have been widely applied to various fields, ranging from geological exploration, marine monitoring, military reconnaissance to medical imaging and forensics [11, 2, 29].

HSI classification, which aims to classify every pixel with a certain label, is the foundation for the application of HSI

[29, 4]. The most commonly used HSI classification method is the supervised classification [10, 35] based on label information. In recent years, the supervised HSI classification has made great progress. For several popular HSI datasets, such as Indian Pines, Salinas and Pavia University images [11, 9]

, the supervised methods have achieved excellent classification accuracy. Particularly, deep learning models

[27, 11, 16, 5]

, such as Convolutional Neural Networks (CNNs)

[14, 23], have extremely narrowed the gap between human and machine. Unfortunately, the supervised method typically requires a large amount of labeled data, which cannot be satisfied in HSI scenarios due to the high cost of labeling training data. Furthermore, the supervised methods are difficult to deal with unknown objects, since they are modeled by the known classes.

Fig. 1: The motivation of GCSC. In the figure, the blue and red points signify that two different classes that lie in the subspace and , respectively. The red point with a blue outline and the blue point with a red outline denote the misclassified points. Intuitively, GCSC converts the traditional data into the graph-structured data and adopts the graph convolution to generate the robust embedding for the subsequent subspace clustering.

To avoid manual data annotation, many works have dedicated to developing unsupervised HSI classification methods namely HSI clustering. Instead of using the label information, the HSI clustering aims to find the intrinsic relationship between data points and automatically determine labels in an unsupervised manner [36]. The key to the HSI clustering is to measure the similarity between data points [6]

. Traditional clustering methods, e.g., K-means

[53], frequently use the pair-wise distance as the similarity measurement, such as the Euclidean distance. Owing to the mixed pixel and the redundant band problem [18, 43], these methods often suffer from unreliable measurement and making the HSI clustering in great challenges. Compared with the supervised classification, there are quite fewer studies on the HSI clustering [49] and they are usually uncompetitive in terms of accuracy.

Recently, subspace clustering [37] has drawn increasing attention in the HSI clustering [19, 47, 45, 48, 40]

due to its ability to handle high-dimensional data and its reliable performance. Technically, the subspace clustering seeks to express the data points as a linear combination of a self-expressive dictionary in the same subspace

[25]. The subspace clustering model typically consists of two steps, i.e., self-representation [20]

and Spectral Clustering (SC)


. To improve the performance of the subspace clustering, many works have devoted to constructing a robust affinity matrix by using various techniques. For example, Sparse Subspace Clustering (SSC)

[7] uses an -norm to encourage a sparse affinity matrix, while Low Rank Subspace Clustering (LRSC) [38] adopts a nuclear norm to enforce the affinity matrix to be low-rank. By considering the spectral and spatial properties of HSIs, Zhang et al. proposed a Spectral–Spatial Sparse Subspace Clustering (SC) [48]. Kernel subspace clustering [34] was proposed as the nonlinear extension of the subspace clustering model by implicitly mapping data into higher kernel space. In [47], an improved kernel subspace clustering was applied to the HSI clustering.

However, the previous subspace clustering models are based on the Euclidean data and often ignore the inherent graph structure contained in the data points. On the one hand, the data points are usually corrupted by noise or can have entries with large errors. On the other hand, although manifold regularization is useful to incorporate graph information into the subspace clustering, such as graph regularized LRSC [46, 42], it usually needs to add an additional regularization term and a tradeoff parameter. The recent development of Graph Neural Networks (GNNs) [52, 41, 51] generalizes the powerful CNNs in dealing with the Euclidean data to modeling the graph-structured data. This allows us to revisit traditional problems with GNNs [26, 21]. However, the subspace clustering that combines graph learning has not attracted too much attention.

To learn graph embedding and affinity, simultaneously, in this paper, we present a Graph Convolutional Subspace Clustering (GCSC) framework that recasts the traditional subspace clustering into the non-Euclidean domain. Specifically, the GCSC framework calculates the self-representation coefficients of the subspace clustering by leveraging a graph convolutional self-representation model combining both graph and feature information. As a result, the proposed framework can circumvent noise data and tends to produce a more robust affinity than the traditional subspace clustering models. Visually, an intuitive description about the motivation of GCSC is illustrated in Fig. 1.

To sum up, the main contributions of this work are:

  1. A robust subspace clustering framework called GCSC is developed for the HSI clustering in which the subspace clustering is recasted into the non-Euclidean domain. Particularly, the traditional subspace clustering can be viewed as the special form of the proposed framework.

  2. Based on the Frobenius norm, two novel and efficient subspace clustering models are proposed under the GCSC framework. We refer to them as Efficient GCSC (EGCSC) and Efficient Kernel GCSC (EKGCSC), respectively. Both EGCSC and EKGCSC have a closed-form solution, making them easier to implement, train, and apply in practice.

  3. Our experimental results on several HSI datasets show that the proposed subspace clustering models are effectively better than many existing clustering methods for the HSI clustering. The successful attempt of GCSC offers an alternative orientation for unsupervised learning.

The rest of the paper is structured as follows. We first briefly review the subspace clustering, graph convolutional networks, and HSI clustering in Section II. Secondly, we describe the details of the developed GCSC framework and its two implementations in Section III. In section IV, we give extensive experimental results and empirical analysis. Finally, we conclude with a summary and final remarks in Section V.

Ii Preliminaries and Related Work

Ii-a Notations

Throughout this paper, boldface lowercase italics symbols (e.g., ), boldface uppercase roman symbols (e.g., ), regular italics symbols (e.g., ), and calligraphy symbols (e.g.,

) denote vectors, matrices, scalars, and sets, respectively. A graph is represented as

, where denotes the node set of the graph with and , indicates the edge set with , and stands for an adjacency matrix. We define the diagonal degree matrix of the graph as , where . The graph Laplacian is defined as , and its normalized version is given by . In this paper, denotes the transpose of matrix and

denotes an identity matrix with the size of

. The Frobenius norm of a matrix is defined as and the trace of a matrix is denoted as .

Ii-B Subspace Clustering Models

Let be a collection of data points drawn from a union of linear or affinity subspaces , where , , and denote the number of data points, features, and subspaces, respectively. The subspace clustering model for the given data set is defined as the following self-representation problem [7, 42]:


where denotes the self-expressive coefficient matrix and enforces the diagonal elements of to be zero so that the trivial solutions are avoided. denotes a -norm of matrix , e.g., (SSC) [7]), and (-SSC [45]) .

In the SSC model, the self-expressive coefficient matrix is assumed to be sparse and the self-representation problem is often formulated as


Here, the -norm tends to produce a sparse coefficient matrix. By using a nuclear norm, LRSC [54, 28, 42] reformulates the self-expressiveness property of data as


where and denote the nuclear norm and -norm of a matrix. LRSC has been proven to be effective to incorporate the global structure of data. Furthermore, subspace clustering can use to model corrupted data, i.e., where is arbitrary noise.

The above problems can be efficiently solved by using convex optimization methods, such as Alternating Direction Method of Multipliers (ADMM) [37, 30]. Once the coefficient matrix is found, the subspace clustering seeks to segment an affinity matrix by Spectral Clustering (SC) method [40].

Fig. 2: The flowchart of the proposed graph convolutional subspace clustering framework. The framework follows the basic steps of the existing subspace clustering methods but takes a feature matrix and a adjacency matrix of the graph as inputs, which results in a novel graph convolutional self-representation. The graph convolution will generate a robust graph embedding that is used as a dictionary for the subsequent affinity learning.

Ii-C Graph Convolutional Networks

There is an increasing interest in generalizing convolutions to the graph domain [52, 1]. The recent development of GNNs that allows to efficiently approximate convolution on graph-structured data. GNNs can typically divide into two categories [51, 41]

: spectral convolutions, which perform convolution by transforming node representations into the spectral domain using the graph Fourier transform or its extensions, and spatial convolutions, which perform convolution by considering node neighborhoods. Unless otherwise specified, the graph convolution involved in this paper is the spectral convolution.

One of the most representative graph convolution models is the Graph convolutional networks (GCN) developed by Kift et al. [21]. GCN simplifies the spectral convolution by approximating spectral filters with the

-order Chebyshev polynomials and setting the largest eigenvalue of the normalized graph Laplacian

to . Formally, GCN defines spectral convolution over a graph as follows:


Here, is an adjacency matrix with self-loops, denotes the degree matrix of nodes whose elements are given by , denotes a trainable parameter matrix, and

is a nonlinear activation function. Specifically, GCN takes a node feature matrix

and an adjacency matrix as inputs, and produce a graph embedding , where is the output dimension.

GCN is originally developed for semi-supervised node classification. By stacking several graph convolution layers, GCN is possible to learn deeper graph representation. Similar to traditional deep neural networks, GCN can be easily trained using gradient descent methods. In different tasks, the GCN models allow many traditional problems to be revised in the non-Euclidean domain.

Ii-D HSI Clustering

Although supervised methods have achieved great success in HSI classification [10, 16], they are limited by the lack of sufficient labeled data. To avoid data annotation, HSI clustering has attracted increasing attention. HSI clustering is based on the fact that the same land-cover object often shows similar spectral curves. The traditional centroid based clustering methods, such as K-means [15] and fuzzy c-means (FCM) [24], are widely used and easy to implement. However, this kind of method is sensitive to the random initialization state and thus their clustering results are hard to reproduce [49].

Due to the stable performance, subspace clustering is frequently-used for HSI clustering. Zhang et al. [45, 47, 19, 48] have successfully applied various subspace clustering methods to the HSI clustering, including Spectral–Spatial Sparse Subspace Clustering (SC) [48], Kernel Sparse Subspace Clustering [47], Joint Sparsity Based Sparse Subspace Clustering (JSSC) [19], and so on. It is benefited from the ability to exploit the intrinsic structure of data, subspace clustering has achieved impressive performance. In recently, evolutionary optimization based clustering method has attracted increasing interests, e.g., evolutionary multiobjective optimization based HSI clustering [33, 39]

. It is well known that the evolutionary algorithm is powerful to search the globally optimal solution but it often results in huge computational cost

[8, 13, 12, 3].

It has been proven to be effective to improve HSI clustering performance by utilizing spectral and spatial information, simultaneously. In [49], Zhang et al. developed a state-of-the-art HSI clustering method called Robust Manifold Matrix Factorization (RMMF) clustering by combining HSI dimensionality reduction and data clustering, simultaneously. Kong et al. [22] proposed an Unsupervised Broad Learning (UBL) clustering method that combines clustering with broad representation learning. In our recent work [44], we proposed a deep subspace clustering method for the HSI clustering, which further demonstrates the potential of combining clustering models with feature learning.

Iii Methodology

In this section, we first introduce the proposed Graph Convolutional Subspace Clustering (GCSC) framework. Then, we provide the details of two novel subspace clustering models based on the framework, i.e., Efficient Graph Convolutional Subspace Clustering (EGCSC) and Efficient Kernel Graph Convolutional Subspace Clustering (EKGCSC). We illustrate a schematic representation of the proposed framework in Fig. 2 and more details are given in the following subsections.

Iii-a Graph Convolutional Subspace Clustering Framework

Inspired by the recent development of GCNs, we present a novel subspace clustering framework by incorporating graph embedding into subspace clustering. We refer to the framework as GCSC. The goal of the GCSC framework is to utilize graph convolution to learn a robust affinity. For this purpose, we first modify the traditional self-representation as follows:


Here, is the self-representation coefficient matrix and denotes the normalized symmetrical Laplacian matrix with self-loops. Notably, can be treated as a special linear graph convolution operation (or a special graph auto-encoder) parameterized by . We call Eq. (5) graph convolutional self-representation.

Parallel to traditional subspace clustering models, the GCSC framework can be rewritten as


where and denote any appropriate matrix norm, such as and nuclear norm, and is a tradeoff coefficient. It is easy to prove that the traditional subspace models are the special cases of our framework, i.e., the traditional subspace models depend only on data features. For example, when is the Frobenius norm and , Eq. (6) becomes the extension of the classical SSC [7], while is norm and is the nuclear norm, Eq. (6) degenerates to LRSC [38, 28]. Eq. (6) can be effectively solved by the same method adopted in the traditional subspace clustering. Once the self-representation coefficient matrix is obtained, SC can be used to generate the clustering results.

Iii-B Efficient GCSC

Basing on the proposed framework, we present the first novel subspace clustering model, namely Efficient GCSC (EGCSC), by setting both and to be the Frobenius norm. Formally, we formulate the EGCSC model as


In [31], Pan et al. have proven that the Frobenius norm will not result in trivial solutions even without constraint . This leads to a dense self-representation coefficient matrix and can be denoted as an efficient closed-form solution. The solution is given by


The proof of Eq. (8) is given in Appendix section VI.

Having obtained , we can use it to construct an affinity matrix for the SC. However, there is no globally-accepted solution for this step in the literature. Most existing works typically compute the affinity matrix by or

. In this paper, we use the heuristic adopted by Efficient Dense Subspace Clustering (EDSC)

[31] to enhance the block-structure, which is proved beneficial for clustering accuracy. The pseudocode of the EGCSC is given in Algorithm 1.

Input: , , , and the number of clusters.
1 Compute ;
2 Compute coefficient matrix: ;
3 Construct affinity matrix ;
4 Apply spectral clustering on ;
Output: Clustering results.
Algorithm 1 EGCSC

Iii-C Efficient Kernel GCSC

We have proposed the EGCSC method. However, the EGCSC model is essentially modeled on linear subspaces. Due to the complexity and nonlinearity of HSI, a large number of works have demonstrated that nonlinear models will yield better performance than their linear counterparts. In this subsection, we provide a nonlinear extension of EGCSC by using the kernel trick. The extension is referred to as Efficient Kernel GCSC (EKGCSC).

Let be a mapping from the input space to the reproducing kernel Hilbert space . We define a positive semidefinite kernel Gram matrix as


where denotes the kernel function. In this paper, the Gaussian kernel is used, i.e., , where is the parameter of the Gaussian kernel function. Formally, the EKGCSC model is expressed as


By using kernel trick, Eq. (10) can be equivalently rewritten as


The above problem can be solved by calculating the partial derivative with respect to and set it to be zero (see Appendix section VI). The closed-form solution of EKGCSC is given by


The EKGCSC model explicitly maps the original data points onto a higher-dimensional space, and thus makes a linearly inseparable problem to be a separable one. We use a manner that is similar to EGCSC to construct the affinity matrix and obtain the final clustering results by SC. The pseudocode of EKGCSC is given in Algorithm 2.

Input: , , , kernel parameters, and the number of clusters.
1 Compute ;
2 Compute kernel matrix according to Eq. (9);
3 Compute coefficient matrix: ;
4 Construct affinity matrix ;
5 Apply spectral clustering on ;
Output: Clustering results.
Algorithm 2 EKGCSC

Iii-D HSI Clustering Using The GCSC Models

We use the proposed GCSC models for HSI clustering. Two essential issues need to be tackled before using the GCSC models. First, HSI data often includes many spectral bands with lots of redundancy, and thus using only spectral features is hard to achieve good performance. Second, the GCSC models are based on the graph-structured data, and however, HSI is typically a Euclidean data.

To remedy the first issue, the following procedures are employed. We first use Principle Component Analysis (PCA) to reduce the spectral dimensionality by preserving the top PCs. On the one hand, PCA reduces the redundant information contained in HSI data. On the other hand, it increases computational efficiency when model training. To take spectral and spatial information consideration, simultaneously, we represent every data point by extracting 3D patches. Specifically, every data point is represented by the center pixel and its neighboring pixels. The manner is widely adopted in different HSI spectral-spatial classification methods [2, 9, 16].

For the second issue, we construct a

-nearest neighbor (kNN) graph to represent the graph structure of the data points. Specifically, each data point is viewed as a node over the graph and the

nearest neighbors of consists of the edge relationship. The adjacent matrix of a kNN graph is defined by


where indicates the nearest neighbors of . The neighborhood relationship is obtained by computing the Euclidean distance.

Iii-E Remarks on The Proposed GCSC


Fig. 3: Visualization of data points selected from Indian Pines data set, where we randomly select data points per class and reduce their feature dimensionality into with t-SNE [17]. (a) Original data points. (b) Embedding using graph convolution. It can be seen that corn-till, soybean-notill, and soybean-mintill are mixed in the original data distribution. By contrast, they show better separability and are more compact after transformed by graph convolution.

In this subsection, we provide a deeper insight into the GCSC framework from the following viewpoints. Let be the graph embedding, thus GCSC can be rewritten as


From the viewpoint of sparse representation, GCSC aims to use a self-expressive dictionary matrix to reconstruct the original data. Since considers the global structure information, those noise points will be eliminated and a clear dictionary can be obtained, which is beneficial for producing a robust affinity matrix. It can be seen from Fig. 3, the resulting shows better clustering characteristics than the original . It can be further explained from the viewpoint of graph representation learning. Graph convolution essentially is a special form of the Laplacian smoothing [26], which combines the features of a node and its nearby neighbors. The operation makes the features of the node in the same cluster similar, thus greatly easing the clustering task.

The main differences between the GCSC model and the traditional subspace model are as follows. First, GCSC is built in the non-Euclidean domain. Under the GCSC framework, the traditional subspace clustering models can be considered as special cases in the Euclidean domain. Second, GCSC incorporates the graph structure via graph convolution, while the traditional subspace clustering models do this by manifold regularization. Therefore, GCSC exploits graph information using a more straightforward way.

Datasets SalinasA Indian Pines Pavia University
Pixels 8386 8570 140150
Channels 204 200 103
Clusters 6 4 8
Samples 5348 4391 6445
TABLE I: Summary of Salinas, Indian Pines, and Pavia University datasets.

Iv Experiments

SalinasA 100 30 100 30 0.2
Indian Pines 100 30 100000 30 6
Pavia University 1000 20 60000 30 100
TABLE II: The settings of the important hyper-parameters in EGCSC and EKGCSC.

In this section, we extensively evaluate the clustering performance of the proposed clustering methods on three frequently used HSI datasets. The source codes of EGCSC and EKGCEC are released at

Iv-a Setup

Iv-A1 Datasets and Preprocessing

We conduct experiments on three real HSI images acquired by AVIRIS and ROSIS sensors, i.e., Salinas, Indian Pines, and Pavia University. For computational efficiency, we separately take a sub-scene of these datasets for evaluation as it is done in [46, 32, 22]. Specifically, these sub-scenes are located within the original scenes at , , and , respectively. Notice that the sub-scene taken from the Salinas dataset is also known as the SalinasA dataset. The details of the three datasets are summarized in Table I.

In data preprocessing, we perform PCA to reduce spectral bands into by preserving at least

of the cumulative percentage of variance. We construct spectral-spatial samples by setting neighborhood size to be

for all the datasets. All data points are standardized by scaling into before clustering.

Iv-A2 Evaluation Metrics

Three popular metrics [48, 49, 22] are used to evaluate the clustering performance of clustering models, i.e., Overall Accuracy (OA), Normalized Mutual Information (NMI), and Kappa coefficient (Kappa). These metrics range in , and the higher the scores are, the more accurate the clustering results are achieved. Besides, to evaluate the computational complexity of our models, running time is compared in the experiment.

Iv-A3 Compared Methods

We compare the proposed methods with several existing HSI clustering methods, including traditional clustering methods and state-of-the-art methods. Specifically, the compared traditional clustering methods contain Spectral Clustering (SC) [40], Sparse Subspace Clustering (SSC) [7], Efficient Dense Subspace Clustering (EDSC) [31], Low Rank Subspace Clustering (LRSC) [54], and -norm based SSC (-SSC) [45]. The compared state-of-the-art HSI clustering methods are Spectral-Spatial Sparse Subspace Clustering (SC) [48], Unsupervised Broad Learning (UBL) clustering [22], and Robust Manifold Matrix Factorization (RMMF) [49].

For these HSI clustering methods, i.e., -SSC, SC, UBL, and RMMF, we follow their settings reported in the corresponding literature. The hyper-parameters of EGCSC and EKGCSC are given in Table II. All the compared methods are implemented with Python 3.5 running on an Intel Xeon E5-2620 2.10 GHz CPU with 32 GB RAM.

Data Metric SC [40] SSC [7] LRSC [54] -SSC [45] SC [48] UBL [22] RMMF [49] EDSC [31] EGCSC EKGCSC
SaA. OA 0.6806 0.7666 0.5613 0.6412 0.8631 0.9142 0.9820 0.8702 0.9985 1.0000
NMI 0.7464 0.7571 0.4242 0.6971 0.7977 0.8692 0.9483 0.9135 0.9949 1.0000
Kappa 0.6002 0.7138 0.4487 0.5546 0.8312 0.8943 0.9775 0.8384 0.9981 1.0000
InP. OA 0.6841 0.4937 0.5142 0.6645 0.7008 0.6258 0.7121 0.7126 0.8483 0.8761
NMI 0.5339 0.2261 0.2455 0.3380 0.5445 0.6680 0.4985 0.4717 0.6422 0.6959
Kappa 0.5055 0.2913 0.3145 0.5260 0.5825 0.4690 0.5609 0.5657 0.6422 0.8211
PaU. OA 0.7691 0.6146 0.4326 0.5842 0.6509 0.7083 0.7704 0.6175 0.8442 0.9736
NMI 0.6784 0.6545 0.3793 0.4942 0.7031 0.6874 0.7388 0.5750 0.8401 0.9529
Kappa 0.8086 0.4886 0.2549 0.3687 0.5852 0.6533 0.6804 0.4250 0.7968 0.9653
TABLE III: The clustering performance of the compared methods on Indian Pines, SalinasA, and PaviaU datasets. The best results are highlighted in bold.

Iv-B Results

Iv-B1 Quantitative Results

Table III gives the clustering performance comparison of different methods evaluated on SalinasA, Indian Pines, and Pavia University datasets. As can be seen from the results, the proposed GCSC methods achieve the best clustering performance and significantly outperform the other clustering methods in terms of OA, NMI, and Kappa. We can further find the following tendencies from the results.

First, equipped with graph convolution, the traditional subspace clustering models can achieve remarkable improvement compared with the traditional counterpart. For example, EGCSC is significantly better than EDSC. It signifies that the proposed GCSC framework is beneficial for subspace clustering. It can be further seen from Table III, few of the compared clustering methods can achieve OA. On the contrary, OAs yielded by the EGCSC and EKGCSC models are generally better than on all the datasets. Particularly, on the SalinasA dataset, EKGCSC achieves a perfect () clustering performance.

Second, EKGCSC outperforms EGCSC on all the three datasets. Due to the complexity of HSI, linear models often can not fully exploit the relationship among data points. By extending EGCSC into the nonlinear kernel space, EKGCSC’s performance can be dramatically enhanced. In other words, EKGCSC considers the nonlinear relationship between data points and makes the learned affinity matrix more robust. In the experiment, EKGCSC achieves , , and improvement on SalinasA, Indian Pines, and Pavia University datasets, respectively.

Third, the results obtained by EGCSC and EKGCSC are comparable with many supervised HSI classification methods [16, 10, 50]. Specifically, the EKGCSC model achieves , , and in terms of OA on the SalinasA, Indian Pines, and Pavia University datasets, respectively. The recent development of supervised HSI classification allows achieving excellent results. However, the unsupervised classification of HSI is still a challenging task. The state-of-the-art clustering performance of our methods bridges the gap between unsupervised HSI classification and supervised HSI classification.

Iv-B2 Results Visualization









Fig. 4: Clustering results obtained by different methods on the SalinasA dataset: (a) Ground truth, (b) SC , (c) SSC , (d) -SSC , (e) LRSC , (f) RMMF , (g) EDSC , (h) EGCSC , and (i) EKGCSC .









Fig. 5: Clustering results obtained by different methods on the Indian Pines dataset: (a) Ground truth, (b) SC , (c) SSC , (d) -SSC , (e) LRSC , (f) RMMF , (g) EDSC , (h) EGCSC , and (i) EKGCSC .









Fig. 6: Clustering results obtained by different methods on the Pavia University dataset: (a) Ground truth, (b) SC , (c) SSC , (d) -SSC , (e) LRSC , (f) RMMF , (g) EDSC , (h) EGCSC , and (i) EKGCSC .

To visually observe the clustering results, we visualize the clustering maps of different clustering methods in Fig. 4-6. Since the source codes of SC and UBL have not been released, their class maps are not included in the figures but it does not affect the analysis. Notice that the color of the same class may be variant in different class maps, which is because label values may be permuted by different clustering methods. Observed from Fig. 4, the class map obtained by EKGCSC on the SalinasA dataset is in complete agreement with the ground truth. For the Indian Pines and Pavia University datasets (i.e., Fig. 5 and Fig. 6), EKGCSC shows the best class maps that are closest to the ground truths. Compared with the other competitors, EGCSC shows better class maps. While the class maps obtained by the other methods ( e.g., SSC, LRSC, and EDSC) contain relatively more noisy points caused by misclassification. Briefly, the results demonstrate the effectiveness and superiority of the proposed GCSC framework.

Iv-B3 Visualization of The Learned Affinity Matrix






Fig. 7: Visualization of the obtained affinity matrix of EGCSC and EKGCSC on (a)-(b) SalinasA, (c)-(d)Indian Pines, and (e)-(f) Pavia University datasets.

In Fig. 7, we visualize the affinity matrices learned by the EGCSC and EKGCSC models. For better presentation, we have re-ordered data points according to the ground truth before computing the affinity matrix. In the figures, each column or row of the affinity matrix denotes the self-representation coefficients that using all data points to represent the corresponding data point. Therefore, the larger the coefficient is, the more the corresponding data point contributes to the reconstruction. Ideally, if a group of data points belongs to the same cluster, then their self-representation coefficients to each other will be non-zero, otherwise, they will be zero. Thus, an ideal affinity matrix is block-diagonal. From Fig. 7 (a)-(f), we can observe that the obtained affinity matrices by both EGCSC and EKGCSC are sparse and have an apparent block-diagonal structure. Furthermore, EKGCSC shows better block-structure than EGCSC, which demonstrates that EKGCSC can more accurately explore the intrinsic relationships between data points and thus achieve better performance.

Iv-B4 Impact of and






Fig. 8: Influence of and for EGCSC and EKGCSC on (a)-(b) SalinasA, (c)-(d) Indian Pines, and (e)-(f) Pavia University datasets.

In this experiment, we investigate the impact of the two most important hyper-parameters involved in the GCSC framework, i.e., the regularization coefficient and the number of the nearest neighbors for the kNN graph. We set in the range of for SalinasA and Indian Pines datasets, and for Pavia University dataset. For clarity, we take into account for plotting. For all datasets, we let vary from to with an interval of . The results are shown in Fig. 8. It can be seen that has a significant impact on clustering performance. We can further observe a tendency, i.e., the clustering performance will increase as increased. By contrast, both EGCSC and EKGCSC are insensitive to . However, when is too large, graph convolution will result in an over-smoothing problem. That is, the graph embedding of all the data points will become similar. Therefore, too large may negatively affect the clustering performance. According to the empirical study, we provide a group of the best hyper-parameter setting in Table II.

Iv-B5 Impact of The Number of PCs



Fig. 9: Clustering OA under varying number of PCs for (a) SalinasA, (b) Indian Pines, and (c) Pavia University datasets.

To empirically study the influence of the number of PCs, we perform the proposed methods with varying PCs from to . We show the results in Fig. 9. As shown in the figures, the clustering performance of the GCSC models increases with PCs, which is because more spectral information will be included when more PCs are considered. However, it does not always enhance model performance since more redundancy might be increased. For example, Fig. 9 (c) shows the best number of PCs is instead of . Although dimensionality reduction is an optional step in our framework, it achieves a good balance between the computational efficiency and model performance.

Iv-B6 Comparison of Running Time

Data SC [40] SSC [7] LRSC [54] -SSC [45] SC [48] UBL [22] RMMF [49] EDSC [31] EGCSC EKGCSC
SaA. 13.203 855.663 7030.710 4.717 9363.5 2509.68 9.448 42.762 92.951 131.485
InP. 8.587 653.998 3980.004 3.272 1567.9 104.90 1.494 24.093 69.366 99.052
PaU. 15.640 1022.382 15861.621 15.677 7398.3 237.44 4.310 98.533 124.333 264.649
TABLE IV: Running time of different methods (in second).

We compare our methods with the other competitors in terms of running time. Table IV lists the running time of different clustering methods. Since EGCSC and EKGCSC have closed-form solutions without needing an iterative operation, they are significantly faster than SSC, LRSC, SC, and UBL. Although SC, -SSC, and RMMF take less running time, they cannot achieve better performance than our methods. Compared with EDSC, both of our methods take relatively more running time, which is because the proposed GCSC framework needs to construct the graph from data points. Furthermore, EKGCSC needs to compute the kernel matrix, thus its running time will be increased compared with EGCSC. To sum up, our proposed EGCSC and EKGCSC models achieve a good balance between time cost and clustering accuracy.

V Conclusions

We have proposed a novel HSI clustering framework, termed as GCSC, based on introducing graph convolution into subspace clustering. The key to the proposed framework is to utilize a graph convolutional self-representation to incorporate the intrinsic structure information of data points. Traditional subspace clustering models can be treated as the special forms of the GCSC framework built on the Euclidean data. Benefiting from the graph convolution, the GCSC model tends to use a clear dictionary to learn a robust affinity matrix. We design two efficient subspace clustering models (i.e., EGCSC and EKGCSC) based on the proposed GCSC framework by using the Frobenius norm. The experimental results on three HSI data sets demonstrate that the proposed GCSC models can achieve state-of-the-art performance with significant margins compared with many existing clustering models. Particularly, the EKGCSC model achieves , , and clustering OA on SalinasA, Indian Pines, and Pavia University datasets, respectively.

The successful attempt of the GCSC model signifies that considering the intrinsic graph structure among data set is important for clustering, which offers an alternative orientation for unsupervised learning. The proposed GCSC framework also enables us to revisit traditional clustering models in the non-Euclidean domain. There are many promising ways to improve the GCSC model. For example, one can consider deep graph embedding into GCSC. These issues will be further studied in our future works.

Vi Appendix


The solution of EGCSC.


be the loss function of EGCSC and Eq. (

7) can be rewritten as


According to the properties of matrix trace and matrix derivatives, the partial derivative of with respect to can be presented as


Let , we get


Finally, can be expressed as


Due to is positive semidefinite, its reversible always exists.


The solution of EKGCSC.

Similar to the above proof, the loss function of EKGCSC (Eq. (10)) can be expressed as


The partial derivative of with respect to is then given by


By setting , we finally get the optimal solution of as follows:



The authors would like to thank the anonymous reviewers for their constructive suggestions and criticisms. We would also like to thank Prof. Lefei Zhang who provided the source codes of the RMMF algorithm.


  • [1] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst (2017-07) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. External Links: Document, ISSN 1558-0792 Cited by: §II-C.
  • [2] Y. Cai, X. Liu, and Z. Cai (2020) BS-nets: an end-to-end framework for band selection of hyperspectral image. IEEE Transactions on Geoscience and Remote Sensing 58 (3), pp. 1969–1984. Cited by: §I, §III-D.
  • [3] Y. Cai, Z. Cai, M. Zeng, X. Liu, J. Wu, and G. Wang (2018) A novel deep learning approach: stacked evolutionary auto-encoder. In 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §II-D.
  • [4] Y. Cai, Z. Dong, Z. Cai, X. Liu, and G. Wang (2019) Discriminative spectral-spatial attention-aware residual network for hyperspectral image classification. In 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), pp. 1–5. Cited by: §I.
  • [5] Y. Cai, X. Liu, Y. Zhang, and Z. Cai (2018) Hierarchical ensemble of extreme learning machine. Pattern Recognition Letters. External Links: ISSN 0167-8655 Cited by: §I.
  • [6] J. Chang, G. Meng, L. Wang, S. Xiang, and C. Pan (2018) Deep self-evolution clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence (), pp. 1–1. Cited by: §I.
  • [7] E. Elhamifar and R. Vidal (2013-11) Sparse subspace clustering: algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11), pp. 2765–2781. External Links: ISSN 0162-8828 Cited by: §I, §II-B, §II-B, §III-A, §IV-A3, TABLE III, TABLE IV.
  • [8] X. Fang, Y. Cai, Z. Cai, X. Jiang, and Z. Chen (2020) Sparse feature learning of hyperspectral imagery via multiobjective-based extreme learning machine.. Sensors 20 (5), pp. 1262. Cited by: §II-D.
  • [9] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton (2013-03) Advances in spectral-spatial classification of hyperspectral images. Proceedings of the IEEE 101 (3), pp. 652–675. External Links: ISSN 0018-9219 Cited by: §I, §III-D.
  • [10] P. Ghamisi, E. Maggiori, S. Li, R. Souza, Y. Tarablaka, G. Moser, A. De Giorgi, L. Fang, Y. Chen, M. Chi, S. B. Serpico, and J. A. Benediktsson (2018-Sep.) New frontiers in spectral-spatial hyperspectral image classification: the latest advances based on mathematical morphology, markov random fields, segmentation, sparse representation, and deep learning. IEEE Geoscience and Remote Sensing Magazine 6 (3), pp. 10–43. Cited by: §I, §II-D, §IV-B1.
  • [11] P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and A. Plaza (2017-12) Advances in hyperspectral image and signal processing: a comprehensive overview of the state of the art. IEEE Geoscience and Remote Sensing Magazine 5 (4), pp. 37–78. External Links: ISSN 2168-6831 Cited by: §I, §I.
  • [12] W. Gong, Y. Wang, Z. Cai, and L. Wang (2018) Finding multiple roots of nonlinear equation systems via a repulsion-based adaptive differential evolution. IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–15. External Links: Document Cited by: §II-D.
  • [13] W. Gong and Z. Cai (2013) Parameter extraction of solar cell models using repaired adaptive differential evolution. Solar Energy 94, pp. 209 – 220. Cited by: §II-D.
  • [14] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen (2018) Recent advances in convolutional neural networks. Pattern Recognition 77, pp. 354–377. External Links: ISSN 00313203 Cited by: §I.
  • [15] J. A. Hartigan and M. A. Wong (1979) Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28 (1), pp. 100–108. External Links: ISSN 00359254, 14679876 Cited by: §II-D.
  • [16] L. He, J. Li, C. Liu, and S. Li (2018) Recent advances on spectral-spatial hyperspectral image classification: an overview and new guidelines. IEEE Transactions on Geoscience and Remote Sensing 56 (3), pp. 1579–1597. External Links: ISSN 0196-2892 1558-0644 Cited by: §I, §II-D, §III-D, §IV-B1.
  • [17] G. E. Hinton (2008) Visualizing high-dimensional data using t-sne.

    Journal of Machine Learning Research

    9 (2), pp. 2579–2605.
    Cited by: Fig. 3.
  • [18] P. Hu, X. Liu, Y. Cai, and Z. Cai (2019-03) Band selection of hyperspectral images using multiobjective optimization-based sparse self-representation. IEEE Geoscience and Remote Sensing Letters 16 (3), pp. 452–456. External Links: ISSN Cited by: §I.
  • [19] S. Huang, H. Zhang, and A. Pižurica (2018-10) Joint sparsity based sparse subspace clustering for hyperspectral images. In 2018 25th IEEE International Conference on Image Processing (ICIP), Vol. , pp. 3878–3882. Cited by: §I, §II-D.
  • [20] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid (2017) Deep subspace clustering networks. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 24–33. Cited by: §I.
  • [21] T. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, Cited by: §I, §II-C.
  • [22] Y. Kong, Y. Cheng, C. L. P. Chen, and X. Wang (2019) Hyperspectral image clustering based on unsupervised broad learning. IEEE Geoscience and Remote Sensing Letters (), pp. 1–5. External Links: ISSN 1545-598X Cited by: §II-D, §IV-A1, §IV-A2, §IV-A3, TABLE III, TABLE IV.
  • [23] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. External Links: ISSN 0028-0836 Cited by: §I.
  • [24] T. Lei, X. Jia, Y. Zhang, S. Liu, H. Meng, and A. K. Nandi (2019) Superpixel-based fast fuzzy c-means clustering for color image segmentation. IEEE Transactions on Fuzzy Systems 27 (9), pp. 1753–1766. Cited by: §II-D.
  • [25] Q. Li, W. Liu, and L. Li (2018) Affinity learning via a diffusion process for subspace clustering. Pattern Recognition 84, pp. 39 – 50. External Links: ISSN 0031-3203 Cited by: §I.
  • [26] Q. Li, Z. Han, and X. Wu (2018)

    Deeper insights into graph convolutional networks for semi-supervised learning


    Thirty-Second AAAI Conference on Artificial Intelligence

    Cited by: §I, §III-E.
  • [27] S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson (2019-Sep.) Deep learning for hyperspectral image classification: an overview. IEEE Transactions on Geoscience and Remote Sensing 57 (9), pp. 6690–6709. External Links: ISSN Cited by: §I.
  • [28] G. Liu, Z. Lin, and Y. Yu (2010) Robust subspace segmentation by low-rank representation. In Proceedings of the 27th International Conference on Machine Learning, pp. 663–670. Cited by: §II-B, §III-A.
  • [29] X. Liu, R. Wang, Z. Cai, Y. Cai, and X. Yin (2019-10) Deep multigrained cascade forest for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 57 (10), pp. 8169–8183. External Links: ISSN Cited by: §I, §I.
  • [30] C. Lu, J. Feng, Z. Lin, T. Mei, and S. Yan (2019-02) Subspace clustering by block diagonal representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2), pp. 487–501. External Links: ISSN Cited by: §II-B.
  • [31] Pan Ji, M. Salzmann, and Hongdong Li (2014-03) Efficient dense subspace clustering. In

    IEEE Winter Conference on Applications of Computer Vision

    Vol. , pp. 461–468. External Links: ISSN Cited by: §III-B, §III-B, §IV-A3, TABLE III, TABLE IV.
  • [32] Y. Pan, Y. Jiao, T. Li, and Y. Gu (2019-05) An efficient algorithm for hyperspectral image clustering. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 2167–2171. External Links: ISSN 2379-190X Cited by: §IV-A1.
  • [33] A. Paoli, F. Melgani, and E. Pasolli (2009)

    Clustering of hyperspectral images based on multiobjective particle swarm optimization

    IEEE Transactions on Geoscience and Remote Sensing 47 (12), pp. 4175–4188. Cited by: §II-D.
  • [34] V. M. Patel and R. Vidal (2014) Kernel sparse subspace clustering. In 2014 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 2849–2853. Cited by: §I.
  • [35] A. Qin, Z. Shang, J. Tian, Y. Wang, T. Zhang, and Y. Y. Tang (2019-02) Spectral-spatial graph convolutional networks for semisupervised hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters 16 (2), pp. 241–245. Cited by: §I.
  • [36] W. Sun, L. Zhang, B. Du, W. Li, and Y. M. Lai (2015-06) Band selection using improved sparse subspace clustering for hyperspectral imagery classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8 (6), pp. 2784–2797. External Links: ISSN 1939-1404 Cited by: §I.
  • [37] R. Vidal (2011-03) Subspace clustering. IEEE Signal Processing Magazine 28 (2), pp. 52–68. External Links: ISSN Cited by: §I, §II-B.
  • [38] R. Vidal and P. Favaro (2014) Low rank subspace clustering (lrsc). Pattern Recognition Letters 43, pp. 47 – 61. Note: ICPR2012 Awarded Papers External Links: ISSN 0167-8655 Cited by: §I, §III-A.
  • [39] Y. Wan, Y. Zhong, A. Ma, and L. Zhang (2020) Multi-objective sparse subspace clustering for hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing 58 (4), pp. 2290–2307. Cited by: §II-D.
  • [40] R. Wang, F. Nie, and W. Yu (2017-11) Fast spectral clustering with anchor graph for large hyperspectral images. IEEE Geoscience and Remote Sensing Letters 14 (11), pp. 2003–2007. External Links: ISSN 1545-598X Cited by: §I, §II-B, §IV-A3, TABLE III, TABLE IV.
  • [41] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu (2020) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (), pp. 1–21. Cited by: §I, §II-C.
  • [42] M. Yin, J. Gao, and Z. Lin (2016-03) Laplacian regularized low-rank representation and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (3), pp. 504–517. External Links: ISSN 0162-8828 Cited by: §I, §II-B, §II-B.
  • [43] M. Zeng, Y. Cai, Z. Cai, X. Liu, P. Hu, and J. Ku (2019) Unsupervised hyperspectral image band selection based on deep subspace clustering. IEEE Geoscience and Remote Sensing Letters (), pp. 1–5. External Links: ISSN Cited by: §I.
  • [44] M. Zeng, Y. Cai, X. Liu, Z. Cai, and X. Li (2019-07) Spectral-spatial clustering of hyperspectral image based on laplacian regularized deep subspace clustering. In IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Vol. , pp. 2694–2697. External Links: ISSN 2153-6996 Cited by: §II-D.
  • [45] H. Zhai, H. Zhang, L. Zhang, P. Li, and A. Plaza (2017-01) A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery. IEEE Geoscience and Remote Sensing Letters 14 (1), pp. 43–47. External Links: ISSN 1545-598X Cited by: §I, §II-B, §II-D, §IV-A3, TABLE III, TABLE IV.
  • [46] H. Zhai, H. Zhang, L. Zhang, and P. Li (2018) Laplacian-regularized low-rank subspace clustering for hyperspectral image band selection. IEEE Transactions on Geoscience and Remote Sensing (), pp. 1–18. External Links: Document, ISSN 0196-2892 Cited by: §I, §IV-A1.
  • [47] H. Zhai, H. Zhang, X. Xu, L. Zhang, and P. Li (2017)

    Kernel sparse subspace clustering with a spatial max pooling operation for hyperspectral remote sensing data interpretation

    Remote Sensing 9 (4). Cited by: §I, §II-D.
  • [48] H. Zhang, H. Zhai, L. Zhang, and P. Li (2016-06) Spectral-spatial sparse subspace clustering for hyperspectral remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 54 (6), pp. 3672–3684. External Links: ISSN 0196-2892 Cited by: §I, §II-D, §IV-A2, §IV-A3, TABLE III, TABLE IV.
  • [49] L. Zhang, L. Zhang, B. Du, J. You, and D. Tao (2019) Hyperspectral image unsupervised classification by robust manifold matrix factorization. Information Sciences 485, pp. 154 – 169. External Links: ISSN 0020-0255 Cited by: §I, §II-D, §II-D, §IV-A2, §IV-A3, TABLE III, TABLE IV.
  • [50] L. Zhang, L. Zhang, and B. Du (2016) Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine 4 (2), pp. 22–40. External Links: ISSN 2168-6831 2473-2397 Cited by: §IV-B1.
  • [51] Z. Zhang, P. Cui, and W. Zhu (2020) Deep learning on graphs: a survey. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. Cited by: §I, §II-C.
  • [52] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §I, §II-C.
  • [53] J. Zhu, Z. Jiang, G. D. Evangelidis, C. Zhang, S. Pang, and Z. Li (2019) Efficient registration of multi-view point sets by k-means clustering. Information Sciences 488, pp. 205 – 218. External Links: ISSN 0020-0255 Cited by: §I.
  • [54] X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang (2019-08) Low-rank sparse subspace for spectral clustering. IEEE Transactions on Knowledge and Data Engineering 31 (8), pp. 1532–1543. External Links: ISSN 1041-4347 Cited by: §II-B, §IV-A3, TABLE III, TABLE IV.