I Introduction
Subspace clustering is one of the most popular techniques for data analysis, which has attracted increasing interests from numerous areas, such as computer vision, image analysis, and signal processing
[1]. With the assumption of high-dimensional data lying in a union of low-dimensional subspaces, subspace clustering aims to seek a set of subspaces to fit a given data set and perform clustering based on the identified subspaces.
During past decades, many subspace clustering methods have been proposed, which can be roughly classified into four categories: 1) iterative approaches
[2]; 2) statistical approaches [3, 4]; 3) algebraic approaches [5, 6, 7]; and 4) spectral clustering-based approaches [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. In recent years, spectral clustering-based approaches have achieved the state-of-the-art in subspace clustering, of which the key is finding a block-diagonal affinity matrix, where the element of the matrix denotes the similarity between two data points and the block-diagonal structure means that only the similarity among intra-cluster data points is nonzero.
To obtain a block-diagonal affinity matrix, most recent spectral clustering-based approaches measure the similarity using so-called self-expression, i.e. representing each data point as a linear combination of the whole data set and then using the representation coefficients to build the affinity matrix. The major difference of those methods is the constraints enforced on the representation coefficients. For example, sparse subspace clustering (SSC) [11] assumes that each data point can be linearly represented by a few of other points. To achieve this end, SSC adopts the -norm constraint. Low-rank representation (LRR) [12] encourages the coefficient matrix to be low rank, such that it can capture the global structures of data. To obtain low rankness, LRR enforces the nuclear-norm constraint on the coefficients. Different from SSC and LRR, truncated regression representation (TRR) [17, 19] takes Frobenius norm instead of - and nuclear-norm, which has shown promising performance in many real-world applications. Like most existing subspace clustering algorithms [21, 11, 12, 13], the major disadvantage of TRR is that it may not give a satisfactory clustering result when data points cannot be linearly represented with each other. In fact, many real-world data are sampled from multiple nonlinear subspaces, which brings challenges towards TRR and limits its applications in practice.
To group the data drawn from multiple nonlinear subspaces, in this paper, we propose a novel nonlinear subspace clustering method, termed kernel truncated regression representation (KTRR). Our basic idea is based on the following assumption, i.e. there exists a projection space in which the data can be linearly represented. To illustrate this simple but effective idea, we give a toy example in Fig. 1
. The proposed method consists of the following steps: 1) projecting the input into another space via an implicit nonlinear transformation; 2) calculating the global self-expression of the whole data set in the projection space in which the data can be linearly reconstructed; 3) eliminating the effect of errors such as Gaussian noise by zeroing trivial coefficients; 4) constructing a Laplacian graph using the obtained coefficients; 5) solving a generalized Eigen-decomposition problem and obtain clustering with k-means. The contributions and novelty of this work could be summarized as follows:

-
We propose a novel method which can cluster the data points drawn from multiple nonlinear subspaces. To the best of our knowledge, this is the first nonlinear extension of TRR and one of the first several nonlinear clustering approaches.
-
We develop a closed-form solution to our method. This makes our method very efficient, and useful for large-scale data sets.
-
Different from most existing subspace clustering methods like SSC and LRR, KTRR achieves robustness by eliminating the impact of noises in the projection space instead of input space. In other words, KTRR does not require the prior on the structure of errors, which is more competitive to handle corrupted subspaces. Extensive experimental results show that our method significantly outperforms ten other state-of-the-art subspace clustering algorithms regarding accuracy, robustness, and computational cost.
The rest of this paper is organized as follows. Section II discusses some related work. Section III presents the kernel truncated regression and the new robust subspace clustering method. Section IV provides experimental results to illustrate the effectiveness and the efficiency of the proposed algorithm. Section V concludes the paper.
Notations: In this paper, unless specified otherwise, lower-case bold letters
represent column vectors,
upper-case bold letters represent matrices, and the entries of matrices are denoted with subscripts. For instance, is a column vector, is its th entry. is a matrix, is the entry in the th row, th column, and denotes the th column of . Moreover, represents the transpose of , denotes the inverse matrix of , andstands for the identity matrix. Table
I summarizes some notations used throughout the paper.Notation | Definition |
---|---|
the dimension of input data points | |
|
the number of input data points |
|
the number of underlying subspaces |
|
the balance parameter |
|
the th data point |
|
the data matrix |
|
the dictionary matrix for the data point |
|
the kernel matrix of the input data points |
|
the representation vector for the mapped data point |
|
the linear representation coefficients matrix |
|
the similarity matrix among all data points |
|
the normalized Laplacian matrix |
: |
the mapping from the input space to the kernel space |
|
the kernel function |
|
Ii Related Work
During past decades, some spectral clustering-based methods have been proposed to achieve subspace clustering in many applications such as image clustering [21], motion segmentation [22], and gene expression analysis [23]. The key of these methods is to obtain a block-diagonal similarity matrix of which nonzero elements are only located on the connections of the points from the same subspace. There are two common strategies to compute the similarity matrix, i.e., pairwise distance-based strategy and linear representation-based strategy [16]. Pairwise distance-based strategy computes the similarity between two points according to their pairwise relationship, e.g., the original spectral clustering method adopts the Euclidean distance with Heat Kernel to calculate the similarity, i.e.,
(1) |
where denotes the similarity between the data point and data point , and the parameter controls the width of the neighborhoods.
Alternatively, linear representation-based approaches assume that each data point could be represented as a linear combination of some points from the intra-subspace. Based on this assumption, the linear representation coefficient can be used as a measurement of similarity and has achieved state of the art in subspace clustering [11, 12, 13, 19, 24, 15, 25, 26] since it encodes the global structure of the whole data set into similarity.
For given a data matrix , these methods linearly represent and obtain the coefficient matrix in a self-expression manner by solving
(2) |
where avoids the trivial solution which uses the data point to represent itself by enforcing the diagonal elements of to be zeros. denotes the adopted prior structured regularization on and the major difference among most existing subspace clustering methods is the choice of . For example, SSC [11] enforces the sparsity on by adopting -norm via , LRR [12] obtains low rankness by using nuclear norm with . To further achieving robustness, (2) is extended as follows:
(3) |
where stands for the errors induced by the noise and corruption, measures the impact of the errors. Generally, the and are used to describe the Gaussian noise and Laplacian noise, respectively. denotes the Frobenius norm
Due to the assumption on linear reconstruction, those methods failed to achieve nonlinear subspaces clustering. To address this challenging issue, some recent works have been proposed [15, 24], however, the methods have the following two disadvantages: 1) the methods are computationally inefficient since they involve solving - or nuclear-norm minimization problem; 2) Like SSC and LRR, the methods need the prior on the errors existed in the data sets to get the correct mathematical formulation. If the prior is inconsistent with the real situation, the methods could achieve inferior performance. To solve these issues, we propose a nonlinear subspace clustering method which is complementary to existing approaches. Noticed that, Peng at al. recently proposed to achieve nonlinearity with deep structures [27, 28]
, which are first works to leverage deep learning and subspace clustering. However, this has been beyond of the scope of this paper.
Iii The proposed subspace clustering method
This section will give the details of our proposed method, which consists of three steps: 1) calculating the kernel truncated regression representation over the whole data set. 2) eliminating the effectiveness of possible errors such as noises from the representation and then building a graph Laplacian. 3) obtaining clustering by performing the k-means algorithm on leading eigenvectors of the graph Laplacian. Moreover, we also give the computational complexity of the proposed method.
Iii-a Kernel Truncated Regression Representation
For a given data set , where , we define a matrix . Let : be a nonlinear mapping which transforms the input into a kernel space , and . After mapping into a kernel space, the corresponding is generally believed lying in linear subspaces [15, 24]. Based on this basic idea, we propose to formulate the objective function of our KTRR as follows:
(4) |
where the first term is the reconstruction error in the kernel space, the second term serves as an -norm regularization, and is a positive real number, which controls the strength of the -norm regularization term.
For each transformed data representation , solving the optimization problem (4), it gives that
(5) |
Note that it requires for solving the above problems of data points with dimensionality of .
To solve (4) more efficiently, we rewrite it as
(6) |
where is a column vector with all zero elements except the -th element is , and the constraint eliminates the trivial solution of writing a transformed point as a linear combination of itself.
Using Lagrangian method, we obtain that
(7) |
where is the Lagrangian multiplier. Clearly,
(8) |
Let , we get
(9) |
Multiplying on both sides of (9), and since , it holds that
(10) |
(11) |
where , and .
One can find that the solution to (11) does not require to be explicitly computed, i.e. we will only need their dot products. Therefore, we can employ kernel functions for computing these dot products without explicitly performing the mapping . For some choices of a kernel : , [29] has shown that can get the dot product in the kernel space induced by the mapping .
We can combine all the dot products as a matrix whose elements are calculated as
(12) |
where . The matrix is the kernel matrix, which is a symmetric and positive semidefinite matrix. Accordingly, (11) can be rewritten as
(13) |
where , and .
It is notable that only one pseudo-inverse operation is needed for solving the representation problems of all data points. The computational complexity of calculating the optimal solutions in (13) has decreased to for data points with dimensions.
It has been proved that, under certain condition, the coefficients over intra-subspace data points are larger than those over inter-subspace data points [17]. After representing the data set by the kernel matrix via (13), we handle the errors by performing a hard thresholding operator over , where keeps largest entries in and sets other entries as zeros like TRR, i.e.,
(14) |
and
(15) |
where consists of largest elements of . Typically, the optimal equals to the dimensionality of corresponding kernel subspace. In this manner, it avoids to model the impact of the noises into the optimization problem explicitly and does not need the prior knowledge about the errors.
Iii-B KTRR for Robust Subspace Clustering
In this section, we present the method to achieve subspace clustering by incorporating the KTRR into spectral clustering framework [8].
For a given data set which consists of data points in , we assume that these points should be lying in a union of low-dimensional nonlinear subspaces. We propose to project the data points into another space, in which the mapped points can be linearly represented by these mapped points from the intra-subspace. From (11), we find that the representation coefficients does not require the projection function in explicit form, but are only needed in dot products. We can induce a kernel function to calculate these dot products, and obtain the representation coefficients via (13).
Moreover, the existence of the errors in the input data set leads to some error connections among the data points from different subspaces. We propose to remove these errors through a hard thresholding on each column vector of the coefficient matrix via (14).
As we claimed before, these representation coefficients can be seen as the similarities among the input data points. The similarity between two intra-subspace data points is large, and that between two inter-subspace data points is zero or very close to zero. Such that we can build a similarity matrix based on the obtained coefficient matrix as
(16) |
This is a symmetric similarity matrix which is suitable for integrating into the spectral clustering framework.
Then, we compute the normalized Laplacian matrix [8]
(17) |
where is a diagonal matrix with . The matrix
is positive semi-definite and has an eigenvalue equals
with eigenvector [9], where .Next, we calculate the first eigenvectors of , which corresponding to its first smallest nonzero eigenvalues, and construct a matrix .
Finally, we apply the k-means clustering method on the matrix , by treating each row vector as a point, to get the clustering membership. The proposed subspace clustering algorithm is summarized in Algorithm 1.
Iii-C Computational Complexity Analysis
Given a data matrix , the KTRR takes to compute the kernel matrix . Then it takes to obtain the matrix , and to calculate all the solutions in (13) with the matrices and . Finally, it requires to find largest coefficients in each column of the representation matrix . Putting these steps together, we get the computational complexity of KTRR as . This computational complexity is the same as that of TRR, and is considerably less than that of KSSC [30], KLRR[24], where denotes the total number of iterations for the corresponding algorithm, is the rank of , and is the rank for partial SVD at each iteration of KLRR.
Iv Experimental Results and Analysis
In this section, we experimentally evaluate the performance of the proposed method. We consider the results in terms of three aspects: 1) accuracy, 2) robustness, and 3) computational cost. Robustness is evaluated by conducting experiments using samples with two different types of corruptions, i.e., Gaussian noises and random pixel corruption.
Iv-a Databases
Five popular image databases are used in our experiments, including Extended Yale Database B (ExYaleB) [31], Columbia Object Image Library (COIL 20) [32], Columbia Object Image Library (COIL 100) [33], USPS [34], and MNIST [35]. We give the details of these databases as follows:
-
The ExYaleB database contains frontal face images of subjects and around 64 near frontal images under different illuminations per individual, where each image is manually cropped and normalized to the size of pixels [36].
-
The COIL 20 and COIL 100 databases contain 20 and 100 objects respectively. The images of each object were taken degrees apart as the object is rotated on a turntable and each object has images. The size of each image is pixels, with grey levels per pixel [36].
-
The USPS handwritten digit database111
The USPS database and MNIST database used in this paper are download from
http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html. includes ten classes ( digit characters) and 11000 samples in total. We use a popular subset contains handwritten digit images for the experiments, and all of these images are normalized to the size of pixels. In the experiment, we select samples of each subject from the database randomly by following the strategy in [10]. -
The MNIST handwritten digit database includes ten classes ( digit characters) and 60000 samples in total. We use first handwritten digit images of the training subset to conduct the experiments, and all of these images are normalized to the size of pixels. In the experiment, we also select samples of each subject from the database randomly to evaluate the performance of different algorithms.
The details of these real-world databases are summarized in Table II
Dataset | Original size | Normalised size | ||
---|---|---|---|---|
ExYaleB | 38 | 58 | ||
COIL 20 | 20 | 72 | ||
COIL 100 | 100 | 72 | ||
USPS | 10 | 1000 | ||
MNIST | 10 | 1000 |
Iv-B Baselines and Evaluation Metrics
We compare KTRR222The source code of our proposed method are available at https://www.dropbox.com/s/8vj1k1b184w2ksv/KTRR.zip?dl=0. with state-of-art subspace clustering algorithms including truncated regression representation (TRR) [17], kernel low-rank representation (KLRR) [24], kernel sparse subspace clustering (KSSC) [30], Latent low-rank representation (LatLRR) [25], low-rank representation (LRR1) with -norm [12], low-rank representation (LRR2) with -norm [12], sparse subspace clustering (SSC) [11], sparse manifold clustering and embedding (SMCE) [26], local subspace analysis (LSA) [37], and standard spectral clustering (SC) [8].
For a fair comparison, we use the same spectral clustering framework [8] with different similarity matrices obtained by the tested algorithms. Like [24], for all kernel-based algorithms, we adopt the commonly used Gaussian kernel on all datasets and use the default bandwidth parameter which is set to the mean of the distances between all the samples.
Four popular metrics are adopted to evaluate the subspace clustering quality, i.e., accuracy (AC) [38, 10], normalized mutual information (NMI) [38, 10], the adjusted rand index (ARI) [39], and Fscore [40]. The values of above four metrics are higher if the method works better. The values of these four metrics are equal to indicates the predict results is perfectly matching with the ground truth, whereas indicates totally mismatch.
Iv-C Visualisation of Representation and Similarity Matrices
Before evaluating the clustering performance of the proposed method, we illustrate the visualization results of the KTRR coefficients matrix and the obtained similarity matrix. We get the result by using the first facial images in the ExYaleB database, first samples of which belong to the first subject, and the other samples belong to the second subject. We set the parameters as , . The representation matrix in (13) and the constructed similarity matrix are shown in Fig. 2 and Fig. 2 respectively.
![]() |
![]() |
From Fig. 2, we can see that the upper-left part and the bottom-right part are illuminated from the upper-right part and the bottom-left part, but there still exist some non-zero elements in the upper-right part and the bottom-left part. That is to say, the connections among the same subject are much stronger than that among different subjects, while there are many trivial connections among the samples from different subjects since the samples from various subjects are facial images, which have some common characteristics.
As we know, an ideal similarity matrix for the spectral clustering algorithm is a block diagonal matrix, i.e., the connections should only exist among the data points from the same cluster [26, 12, 17, 15, 24], such that a hard thresholding operation has been executed. From the result of similarity matrix in Fig. 2, we find that:
-
Our method reveals the latent structure of data though these images belong to the two subjects. There exist only a few bright spots in the upper-right part and the bottom-left part of the obtained similarity matrix, i.e., the trivial connections among the samples from different subjects have been mostly removed by using the thresholding processing;
-
Almost all of the bright spots lie in the diagonal blocks of the similarity matrix, i.e., the strong connections exist among the samples from the same subject;
-
The obtained similarity matrix is a symmetric matrix which can be directly used for subspace clustering under the framework of spectral clustering [8].
Iv-D Clustering on Clean Images
In this experiment, we compare the KTRR method with other ten state-of-the-art approaches on four different benchmark databases, i.e., Extended Yale Database B (ExYaleB) [31], Columbia Object Image Library (COIL 20) [32], USPS [34], MNIST [35]. For each dataset, we perform each algorithm runs, in each run the k-means clustering step are repeated
times, and report the mean and the standard deviation of the used metrics. The clustering quality on above four databases are shown in Table
III - Table VI. The better means for each database are highlighted in boldface. To have statistically sound conclusions, the Wilcoxon’s rank sum test [41] at a significance level is adopted to test the significance of the differences between the results obtained by the proposed method and all other algorithms. From the results, we can obtain the following conclusions.Metric | KTRR | TRR | KLRR | KSSC | LatLRR | LRR1 | LRR2 | SSC | SMCE | LSA | SC |
---|---|---|---|---|---|---|---|---|---|---|---|
AC | 67.04+2.93 | 52.30+4.31 | 58.41+3.19 | 51.40+3.36 | 50.32+2.68 | 49.80+4.72 | 52.87+5.46 | 48.91+3.71 | 33.97+3.95 | 19.69+1.70 | |
NMI | 72.20+2.61 | 61.98+2.45 | 64.41+1.10 | 54.41+1.76 | 53.31+1.42 | 53.26+2.22 | 58.02+3.44 | 60.22+1.28 | 47.38+1.87 | 32.96+1.54 | |
ARI | 41.07+6.91 | 36.06+3.49 | 32.40+5.82 | 27.10+2.26 | 26.42+2.17 | 25.63+2.70 | 24.20+4.74 | 30.46+3.06 | 20.98+1.45 | 10.16+1.01 | |
Fscore | 43.01+6.52 | 37.87+3.37 | 34.59+5.38 | 29.33+2.07 | 28.66+2.00 | 27.93+2.52 | 26.83+4.34 | 32.54+2.84 | 23.20+1.36 | 12.56+0.98 |
Metric | KTRR | TRR | KLRR | KSSC | LatLRR | LRR1 | LRR2 | SSC | SMCE | LSA | SC |
---|---|---|---|---|---|---|---|---|---|---|---|
AC | 84.12+3.35 | 68.66+5.51 | 79.39+8.15 | 67.97+3.47 | 67.94+7.97 | 66.59+3.35 | 69.39+5.93 | 76.51+15.98 | 72.86+6.67 | 69.17+3.81 | |
NMI | 91.79+0.94 | 77.93+3.24 | 89.50+2.73 | 76.78+1.32 | 76.45+2.20 | 75.33+2.50 | 80.61+2.44 | 90.51+5.69 | 81.49+3.69 | 79.43+2.18 | |
ARI | 80.72+3.05 | 61.96+6.98 | 76.54+7.13 | 60.03+2.42 | 60.03+4.94 | 58.45+4.21 | 62.14+5.18 | 75.20+15.45 | 68.13+5.92 | 63.77+3.88 | |
Fscore | 81.76+2.80 | 63.92+6.55 | 77.81+6.65 | 62.03+2.30 | 62.09+4.66 | 60.57+3.98 | 64.17+4.80 | 76.61+14.41 | 69.74+5.57 | 65.60+3.68 |
Metric | KTRR | TRR | KLRR | KSSC | LatLRR | LRR1 | LRR2 | SSC | SMCE | LSA | SC |
---|---|---|---|---|---|---|---|---|---|---|---|
AC | 60.24+16.98 | 70.72+2.63 | 75.17+2.89 | 70.97+4.10 | 70.16+4.29 | 70.91+3.96 | 26.86+13.39 | 73.77+7.85 | 68.51+8.95 | 70.79+8.58 | |
NMI | 59.55+7.61 | 66.23+3.35 | 73.97+2.41 | 66.55+4.37 | 66.69+4.63 | 66.87+4.35 | 20.93+13.44 | 71.29+9.69 | 64.80+7.93 | 62.72+4.55 | |
ARI | 46.06+13.33 | 57.21+3.76 | 65.11+3.67 | 57.20+4.51 | 57.18+4.79 | 57.50+4.48 | 9.95+13.56 | 62.08+13.10 | 55.58+9.00 | 53.55+5.24 | |
Fscore | 51.98+11.29 | 61.64+3.41 | 68.76+3.28 | 61.59+4.11 | 61.58+4.34 | 61.85+4.08 | 24.07+7.62 | 66.10+11.63 | 60.30+7.99 | 58.25+4.67 |
Metric | KTRR | TRR | KLRR | KSSC | LatLRR | LRR1 | LRR2 | SSC | SMCE | LSA | SC |
---|---|---|---|---|---|---|---|---|---|---|---|
AC | 32.19+16.22 | 61.31+6.14 | 57.13+15.57 | 14.48+8.37 | 18.08+10.57 | 18.52+12.34 | 22.02+20.53 | 61.66+5.59 | 63.03+7.46 | 55.11+5.45 | |
NMI | 24.73+15.97 | 60.07+5.86 | 59.43+13.06 | 4.04+7.96 | 6.76+11.18 | 7.53+14.97 | 14.58+24.25 | 59.08+3.99 | 61.94+5.78 | 48.80+5.30 | |
ARI | 12.35+13.93 | 47.46+5.77 | 45.02+16.11 | 1.26+4.47 | 2.80+6.50 | 3.27+8.64 | 6.16+16.06 | 46.81+5.80 | 49.50+8.19 | 36.95+5.62 | |
Fscore | 24.31+8.95 | 52.99+4.92 | 50.99+14.05 | 18.13+2.74 | 18.42+4.13 | 18.43+5.15 | 21.67+9.34 | 52.39+5.05 | 54.78+7.28 | 43.38+5.01 |
(1) Evaluation on the ExYaleB facial database:
-
The KTRR algorithm achieves the best results in the tests and gains a significant improvement over TRR. The means of , , , and of KTRR are about , , and higher than that of the TRR, , , , and higher than that of the KLRR.
(2) Evaluation on the COIL 20 database:
-
The KTRR algorithm gets the of , which is better than all other tested methods. Specifically, the of KTRR is about higher than that of the second best method TRR, and higher than that of the third best method KLRR.
-
The KLRR, LatLRR and two types of LRR methods are all inferior to the standard spectral method.
(3) Evaluation on the USPS handwriting database:
-
SSC is inferior to LRR, while its kernel-based extension KLRR outperforms the kernel-based extension of LRR. The implicit transformation on the USPS images makes the mapped data points to be much better represented with each other in a sparse representation form.
-
The KTRR algorithm achieves the best results in the tests. The of KTRR is about higher than that of the TRR, higher than that of the KLRR, and higher than that of the KSSC. The performance indices of KTRR on , , and are also greater than other tested methods.
(4) Evaluation on the MNIST handwriting database:
-
All the linear representation methods, i.e., TRR [17], LRR [12], and SSC [11], are inferior to their kernel-based extensions, i.e., KTRR, KLRR [24], and KSSC [15]. Especially, LRR results in poor performance on this database, while its kernel-based version, KLRR, obtains much better clustering quality regarding , , , and .
-
The KTRR, KLRR, SMCE and LSA algorithms achieve the best clustering results on the MNIST handwriting images compared with other methods. However, the performances of all the test methods are not well.
-
The proposed method KTRR achieves the best clustering result and obtains a significant improvement of at on TRR. The indexes , , and of KTRR are also higher than all other tested methods.
Iv-E Clustering on Corrupted Images
To evaluate the robustness of the proposed method, we conduct the experiments on the first subjects of COIL20 database and ExYaleB database respectively. All used images are corrupted by additive white Gaussian noises or random pixel corruptions. Some corrupted image samples under different levels of noises are as shown in Fig. 3. Actually, for the additional Gaussian noises, we add the noises with equals ; For the random pixel corruptions, we adopt the pepper & salt noises with the ratios of affected pixels be .
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The clustering quality of the compared methods on the two databases with additional Gaussian noises is shown in Fig. 4, from which we can get the following observations:
-
Most of these spectral-based methods are relatively robust to the additional Gaussian noises. While the performance of LRR1, LRR2, and LatLRR are sharply deteriorated on these two databases. The main reason may be that the additional Gaussian noises have destroyed the underlie the low-rank structure of the representation matrix.
-
The accuracy of all tested methods on COIL20 database are higher than that on ExYaleB database. It is consistent with the result of that on clean images.
-
The proposed KTRR is considerably more robust than other methods for additional Gaussian noises. Specifically, KTRR obtains the around under , which are much higher than all other tested algorithms, especially SC, LSA, LRR1, and LRR2.
The clustering quality of the compared methods on the images with randomly corruptions is shown in Fig. 5, from which we obtain that:
-
All the investigated methods perform not as well as the case with white Gaussian noise. The result is consistent with a widely-accepted conclusion that non-additive corruptions are more challenging than additive ones in pattern recognition;
-
All of the test algorithms perform much better on COIL20 database than on ExYaleB database. The of all algorithms are lower than on ExYaleB under and of corrupted pixels. From the Fig. 3, we find that most pixel values of the images from COIL20 database are close to or . This leads to some of the corruptions to be useless and weakens the impact to the final clustering results.
-
The KTRR algorithm is robust to the random pixel corruptions. It achieves the best results under the ratio of affected pixels equals to on the test two databases. It obtains the around under the ratio of affected pixels equals on COIL20 database, which is a very challenging situation that we can see in Fig. 3. However, the of KTRR drops severely with the increase of the ratios of corrupted pixels, and lower than that of SSC under and of corrupted pixels. The KTRR should be improved to handle the images with salt pepper corruptions.
Iv-F Computational Time
To investigate the efficiency of KTRR, we compare its computational time with that of other approaches on the clean images of four databases. Our hardware configuration comprises of a -GHz CPU and a 16 GB RAM. The time cost for building similarity graph () and the whole time cost for clustering () are recorded to evaluate the efficiency of compared methods.
Databases | KTRR | TRR | KLRR | KSSC | LatLRR | LRR1 | LRR2 | SSC | SMCE | LSA | SC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ExYaleB | t1 | 22.96 | 23.71 | 45.82 | 5512.68 | 772.44 | 248.94 | 270.65 | 2301.75 | 10.15 | 198.48 | 0.33 |
t2 | 47.62 | 48.8 | 71.26 | 5543.4 | 806.43 | 286.91 | 311.34 | 2313.31 | 45.18 | 229.26 | 124.45 | |
COIL 20 | t1 | 6.50 | 6.54 | 16.11 | 1466.12 | 579.01 | 430.23 | 454.87 | 121.76 | 5.76 | 61.14 | 0.15 |
t2 | 11.66 | 12.75 | 25.55 | 1472.07 | 584.46 | 436.59 | 460.07 | 126.88 | 10.12 | 66.01 | 7.61 | |
USPS | t1 | 16.92 | 11.95 | 29.41 | 2752.97 | 50.05 | 43.34 | 49.10 | 62.25 | 67.44 | 108.67 | 0.14 |
t2 | 27.82 | 22.74 | 39.93 | 2763.19 | 58.98 | 51.88 | 58.90 | 98.79 | 76.50 | 120.46 | 11.73 | |
MNIST | t1 | 22.56 | 22.43 | 34.20 | 5742.89 | 246.35 | 155.70 | 172.32 | 112.13 | 16.78 | 142.71 | 1.09 |
t2 | 33.96 | 32.35 | 44.64 | 5753.42 | 270.19 | 167.25 | 186.82 | 153.23 | 25.52 | 154.64 | 14.65 |
Table VII shows the time cost of different methods with the parameters which achieve their best results. We can see that:
-
The time cost of the proposed method is very close to that of its linear version TRR [17]. Specifically, the TRR method is faster than KTRR on ExYaleB and USPS databases, while it is slower than KTRR on MNIST database. They have similar time cost on COIL database. The Wilcoxon’s rank sum test [41] at a significance level shows there is no significant difference between the time costs of the KTRR and TRR on the similarity graph construction and the whole clustering process on the tested four databases.
-
The KTRR and TRR [17] algorithms are much faster than KSSC, SSC, KLRR, and LRR methods. The results are consist with the fact that the theocratical computation complexities of KTRR and TRR are much lower than that of KSSC, SSC, KLRR, and LRR methods. The KTRR and TRR [17] algorithms both have analytical solutions, and only one pseudo-inverse operation is required for solving the representation problems of all data points for KTRR and TRR algorithms.
Iv-G Clustering Performance with Varying Number of Subjects
In this subsection, we investigate the clustering performance of the proposed method with a different number of subjects on COIL 100 image database. The experiments are carried out on the first classes of the database, where increases from to with an interval of . The clustering results are shown in Fig. 6.

From the results, we can see that:
-
In general, with the number of subjects increase, the clustering performance is decreased since the clustering difficulty is increasing with the number of subjects growth.
-
With increasing number of subjects, the of KTRR is changed slightly, varying from to . The possible reason is that the is robust to the data distribution (increasing subject number) [19].
-
The proposed method obtains satisfactory performance on COIL 100 database. It achieves perfect clustering result for , and gets the satisfactory performance at with , , , and be around , , , and , respectively.
Iv-H Parameter Analysis
The KTRR has two parameters, the tradeoff parameter and the thresholding parameter . The selection of the values of the parameters depends on the data distribution. A bigger is suitable for highly corrupted databases, and corresponds to the dimensionality of the corresponding subspace for the mapped data points.
To evaluate the impact of and , we conduct the experiment on the ExYaleB and COIL20 databases. We set the from to , and from to , the results are shown in Fig. 7 and Fig. 8 .
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
From the results, we get the following observations:
-
The proposed method achieves the best clustering performance with and as and on ExYaleB database, and and on COIL20 database, respectively.
-
The proposed method can obtain satisfactory performance with from to on ExYaleB database, where the , , , and are more than , , , and , respectively, and with from to on COIL20 database, where the , , , and are more than , , , and . The performance of KTRR is not sensitive to the parameter of , which makes KTRR be suitable for the real applications.
-
The clustering quality with from to on ExYaleB and COIL20 databases are much better than other cases. It means that the thresholding process is helpful to improve the performance of KTRR, and the dimensionality of underlying subspaces of the ExYaleB and COIL20 databases in the hidden space belongs the scope from to .
Iv-I Different Kernel Functions
Function | |||||||
---|---|---|---|---|---|---|---|
USPS | AC | 80.38+19.04 | 72.31+18.06 | 81.36+14.93 | 74.97+12.40 | 79.62+17.69 | 73.64+14.46 |
NMI | 76.08+9.75 | 67.38+14.41 | 78.04+6.63 | 75.59+9.22 | 74.18+13.08 | 74.58+8.01 | |
ARI | 70.10+15.43 | 59.48+17.11 | 71.73+12.67 | 65.98+13.81 | 68.32+19.86 | 64.62+14.44 | |
Fscore | 73.25+13.45 | 63.78+15.06 | 74.69+11.13 | 69.72+11.83 | 71.68+17.47 | 68.53+12.38 | |
MNIST | AC | 65.58+11.65 | 66.48+14.54 | 63.97+8.11 | 59.21+14.27 | 61.62+12.43 | 62.06+9.67 |
NMI | 64.27+6.25 | 63.49+6.50 | 66.81+4.43 | 63.54+12.66 | 64.00+9.32 | 65.33+5.59 | |
ARI | 51.62+10.21 | 51.54+11.54 | 52.63+5.04 | 48.62+16.10 | 50.18+11.25 | 51.20+7.18 | |
Fscore | 56.70+8.92 | 56.61+10.09 | 57.92+4.16 | 54.50+13.94 | 55.63+9.71 | 56.66+6.13 |
The commonly used kernel functions are polynomial kernels, radial basis functions, and sigmoid kernels. To investigate the performance of the proposed method using different kernels, we study six different kernel functions. The results on USPS and MINIST databases are shown in Table
VIII, from which we can get the following observations:-
The kernel function achieves the best performance on USPS database. While the kernel function obtains the best performance on MNIST database.
-
The kernel function outperforms on USPS database, which is different to that on MNIST database. It is mainly caused by the fact that the images from USPS database lie in much higher nonlinear subspaces than that from MNIST database, and the former function induced a much more nonlinear mapping.
-
The selection of different kernels results in a great difference in the subspace clustering performance both on USPS database and MNIST database.
V Conclusion
In this paper, we have incorporated the kernel technique into TRR method to achieve robust nonlinear subspace clustering. It does not need the prior knowledge about the structure of errors in the input data and remedies the drawback of the existing TRR method that it cannot deal with the data points from nonlinear subspaces. Moreover, through the theoretical analysis of our proposed mathematical model, we find that the developed optimization problem can be solved analytically, and the closed-form solution is only dependent on the kernel matrix. Theses advantages make our proposed method useful in many real-world applications. Comprehensive experiments on four real-world image databases have demonstrated the effectiveness and the efficiency of the proposed method.
In the future, we plan to conduct a systematical investigation on the selection of optimal kernel for our proposed method and study how to determine the number of nonlinear subspaces automatically.
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under grants 61432012, 61329302, and the Engineering and Physical Sciences Research Council (EPSRC) of U.K. under grant EP/J017515/1.
References
- [1] L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: a review,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 90–105, 2004.
- [2] P. S. Bradley and O. L. Mangasarian, “k-plane clustering,” Journal of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000.
- [3] S. R. Rao, R. Tron, R. Vidal, and Y. Ma, “Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
- [4] H. Derksen, Y. Ma, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy coding and compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, p. 15461562, 2007.
- [5] J. P. Costeira and T. Kanade, “A multibody factorization method for independently moving objects,” International Journal of Computer Vision, vol. 29, no. 3, pp. 159–179, 1998.
- [6] C. W. Gear, “Multibody grouping from motion images,” International Journal of Computer Vision, vol. 29, no. 2, pp. 133–150, 1998.
-
[7]
R. Vidal, Y. Ma, and S. Sastry, “Generalized principal component analysis,”
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 12, pp. 1945–1959, 2005. -
[8]
A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in
Advances in Neural Information Processing Systems, vol. 14, 2002, Conference Proceedings, pp. 849–856. - [9] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
- [10] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with -graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, April 2010.
- [11] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2765–2781, 2013.
- [12] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
- [13] C. Y. Lu, H. Min, Z. Q. Zhao, L. Zhu, D. S. Huang, and S. C. Yan, “Robust and efficient subspace segmentation via least squares regression,” in Proc. of 12th Eur. Conf. Comput. Vis., Florence, Italy, Oct. 2012, pp. 347–360.
- [14] C. Lu, J. Tang, M. Lin, L. Lin, S. Yan, and Z. Lin, “Correntropy induced l2 graph for robust subspace clustering,” in 2013 IEEE International Conference on Computer Vision, Dec 2013, pp. 1801–1808.
- [15] V. M. Patel and R. Vidal, “Kernel sparse subspace clustering,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, Conference Proceedings, pp. 2849–2853.
- [16] L. Zhen, Z. Yi, X. Peng, and D. Peng, “Locally linear representation for image clustering,” Electronics Letters, vol. 50, no. 13, pp. 942–943, 2014.
-
[17]
X. Peng, Z. Yi, and H. Tang, “Robust subspace clustering via thresholding ridge regression,” in
The Twenty-Ninth AAAI Conference on Artificial Intelligence
, 2015, Conference Proceedings, pp. 3827–3833. - [18] H. Liu, T. Liu, J. Wu, D. Tao, and Y. Fu, “Spectral ensemble clustering,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 715–724.
-
[19]
X. Peng, H. Tang, L. Zhang, Z. Yi, and S. Xiao, “A unified framework for
representation-based subspace clustering of out-of-sample and large-scale
data,”
IEEE Transactions on Neural Networks and Learning Systems
, vol. 27, no. 12, pp. 2499–2512, Dec 2016. - [20] X. Peng, C. Lu, Y. Zhang, and H. Tang, “Connections between nuclear norm and frobenius norm based representation,” IEEE Transactions on Neural Networks and Learning Systems, vol. PP, no. 99, pp. 1–7, 2016.
- [21] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, Conference Proceedings, pp. 2790–2797.
- [22] S. R. Rao, R. Tron, R. Vidal, and Y. Ma, “Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
- [23] Z. Yu, H.-S. Wong, and H. Wang, “Graph-based consensus clustering for class discovery from gene expression data,” Bioinformatics, vol. 23, no. 21, pp. 2888–2896, 2007.
- [24] S. Xiao, M. Tan, D. Xu, and Z. Y. Dong, “Robust kernel low-rank representation,” IEEE Transactions on Neural Networks and Learning Systems, 2015.
-
[25]
G. Liu and S. Yan, “Latent low-rank representation for subspace segmentation and feature extraction,” in
2011 International Conference on Computer Vision, Nov 2011, pp. 1615–1622. - [26] E. Elhamifar and R. Vidal, “Sparse manifold clustering and embedding,” in Advances in Neural Information Processing Systems, 2011, Conference Proceedings, pp. 55–63.
- [27] X. Peng, S. Xiao, J. Feng, W. Yau, and Z. Yi, “Deep subspace clustering with sparsity prior,” in Proceedings of the 25 International Joint Conference on Artificial Intelligence, New York, NY, USA, 9-15 July 2016, pp. 1925–1931. [Online]. Available: http://www.ijcai.org/Abstract/16/275
- [28] X. Peng, J. Feng, J. Lu, W.-Y. Yau, and Z. Yi, “Cascade subspace clustering,” in Proceedings of the 31th AAAI Conference on Artificial Intelligence. SFO, USA: AAAI, Feb. 2017, pp. 2478–2484.
- [29] J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis. Cambridge university press, 2004.
- [30] V. M. Patel and R. Vidal, “Kernel sparse subspace clustering,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 2849–2853.
-
[31]
K.-C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces for face recognition under variable lighting,”
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 5, pp. 684–698, 2005. - [32] S. K. N. S. A. Nene and H. Murase, “Columbia object image library (coil-20),” Report, 1996.
- [33] ——, “Columbia object image library (coil-100),” Report, 1996.
- [34] J. J. Hull, “A database for handwritten text recognition research,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, no. 5, pp. 550–554, 1994.
- [35] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- [36] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1548–1560, 2011.
- [37] J. Yan and M. Pollefeys, “A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate,” in European conference on computer vision. Springer, 2006, pp. 94–106.
- [38] X. Zheng, D. Cai, X. He, W.-Y. Ma, and X. Lin, “Locality preserving clustering for image database,” in Proceedings of the 12th annual ACM international conference on Multimedia. ACM, Conference Proceedings, pp. 885–891.
- [39] L. Hubert and P. Arabie, “Comparing partitions,” Journal of classification, vol. 2, no. 1, pp. 193–218, 1985.
-
[40]
C. Goutte and E. Gaussier,
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation
. Springer, 2005, pp. 345–359. - [41] J. D. Gibbons and S. Chakraborti, Nonparametric statistical inference. Springer, 2011.
Comments
There are no comments yet.