Kernel Truncated Regression Representation for Robust Subspace Clustering

05/15/2017 ∙ by Liangli Zhen, et al. ∙ Sichuan University Microsoft University of Birmingham 0

Subspace clustering aims to group data points into multiple clusters of which each corresponds to one subspace. Most existing subspace clustering methods assume that the data could be linearly represented with each other in the input space. In practice, however, this assumption is hard to be satisfied. To achieve nonlinear subspace clustering, we propose a novel method which consists of the following three steps: 1) projecting the data into a hidden space in which the data can be linearly reconstructed from each other; 2) calculating the globally linear reconstruction coefficients in the kernel space; 3) truncating the trivial coefficients to achieve robustness and block-diagonality, and then achieving clustering by solving a graph Laplacian problem. Our method has the advantages of a closed-form solution and capacity of clustering data points that lie in nonlinear subspaces. The first advantage makes our method efficient in handling large-scale data sets, and the second one enables the proposed method to address the nonlinear subspace clustering challenge. Extensive experiments on five real-world datasets demonstrate the effectiveness and the efficiency of the proposed method in comparison with ten state-of-the-art approaches regarding four evaluation metrics.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 7

page 9

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Subspace clustering is one of the most popular techniques for data analysis, which has attracted increasing interests from numerous areas, such as computer vision, image analysis, and signal processing 

[1]

. With the assumption of high-dimensional data lying in a union of low-dimensional subspaces, subspace clustering aims to seek a set of subspaces to fit a given data set and perform clustering based on the identified subspaces.

During past decades, many subspace clustering methods have been proposed, which can be roughly classified into four categories: 1) iterative approaches 

[2]; 2) statistical approaches [3, 4]; 3) algebraic approaches [5, 6, 7]; and 4) spectral clustering-based approaches [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

. In recent years, spectral clustering-based approaches have achieved the state-of-the-art in subspace clustering, of which the key is finding a block-diagonal affinity matrix, where the element of the matrix denotes the similarity between two data points and the block-diagonal structure means that only the similarity among intra-cluster data points is nonzero.

To obtain a block-diagonal affinity matrix, most recent spectral clustering-based approaches measure the similarity using so-called self-expression, i.e. representing each data point as a linear combination of the whole data set and then using the representation coefficients to build the affinity matrix. The major difference of those methods is the constraints enforced on the representation coefficients. For example, sparse subspace clustering (SSC) [11] assumes that each data point can be linearly represented by a few of other points. To achieve this end, SSC adopts the -norm constraint. Low-rank representation (LRR) [12] encourages the coefficient matrix to be low rank, such that it can capture the global structures of data. To obtain low rankness, LRR enforces the nuclear-norm constraint on the coefficients. Different from SSC and LRR, truncated regression representation (TRR) [17, 19] takes Frobenius norm instead of - and nuclear-norm, which has shown promising performance in many real-world applications. Like most existing subspace clustering algorithms [21, 11, 12, 13], the major disadvantage of TRR is that it may not give a satisfactory clustering result when data points cannot be linearly represented with each other. In fact, many real-world data are sampled from multiple nonlinear subspaces, which brings challenges towards TRR and limits its applications in practice.

To group the data drawn from multiple nonlinear subspaces, in this paper, we propose a novel nonlinear subspace clustering method, termed kernel truncated regression representation (KTRR). Our basic idea is based on the following assumption, i.e. there exists a projection space in which the data can be linearly represented. To illustrate this simple but effective idea, we give a toy example in Fig. 1

. The proposed method consists of the following steps: 1) projecting the input into another space via an implicit nonlinear transformation; 2) calculating the global self-expression of the whole data set in the projection space in which the data can be linearly reconstructed; 3) eliminating the effect of errors such as Gaussian noise by zeroing trivial coefficients; 4) constructing a Laplacian graph using the obtained coefficients; 5) solving a generalized Eigen-decomposition problem and obtain clustering with k-means. The contributions and novelty of this work could be summarized as follows:

Fig. 1: The basic idea of our method. By projecting the data into another space with an implicit nonlinear transformation, our method could solve the problem of nonlinear subspace clustering. The left and right plots correspond to the distribution of data in the input and hidden space, respectively.
  • We propose a novel method which can cluster the data points drawn from multiple nonlinear subspaces. To the best of our knowledge, this is the first nonlinear extension of TRR and one of the first several nonlinear clustering approaches.

  • We develop a closed-form solution to our method. This makes our method very efficient, and useful for large-scale data sets.

  • Different from most existing subspace clustering methods like SSC and LRR, KTRR achieves robustness by eliminating the impact of noises in the projection space instead of input space. In other words, KTRR does not require the prior on the structure of errors, which is more competitive to handle corrupted subspaces. Extensive experimental results show that our method significantly outperforms ten other state-of-the-art subspace clustering algorithms regarding accuracy, robustness, and computational cost.

The rest of this paper is organized as follows. Section II discusses some related work. Section III presents the kernel truncated regression and the new robust subspace clustering method. Section IV provides experimental results to illustrate the effectiveness and the efficiency of the proposed algorithm. Section V concludes the paper.

Notations: In this paper, unless specified otherwise, lower-case bold letters

represent column vectors,

upper-case bold letters represent matrices, and the entries of matrices are denoted with subscripts. For instance, is a column vector, is its th entry. is a matrix, is the entry in the th row, th column, and denotes the th column of . Moreover, represents the transpose of , denotes the inverse matrix of , and

stands for the identity matrix. Table 

I summarizes some notations used throughout the paper.

Notation Definition
the dimension of input data points

the number of input data points

the number of underlying subspaces

the balance parameter

the th data point

the data matrix

the dictionary matrix for the data point

the kernel matrix of the input data points


the representation vector for the mapped data point

the linear representation coefficients matrix


the similarity matrix among all data points

the normalized Laplacian matrix

:
the mapping from the input space to the kernel space

the kernel function

TABLE I: Some notations used in this paper.

Ii Related Work

During past decades, some spectral clustering-based methods have been proposed to achieve subspace clustering in many applications such as image clustering [21], motion segmentation [22], and gene expression analysis [23]. The key of these methods is to obtain a block-diagonal similarity matrix of which nonzero elements are only located on the connections of the points from the same subspace. There are two common strategies to compute the similarity matrix, i.e., pairwise distance-based strategy and linear representation-based strategy [16]. Pairwise distance-based strategy computes the similarity between two points according to their pairwise relationship, e.g., the original spectral clustering method adopts the Euclidean distance with Heat Kernel to calculate the similarity, i.e.,

(1)

where denotes the similarity between the data point and data point , and the parameter controls the width of the neighborhoods.

Alternatively, linear representation-based approaches assume that each data point could be represented as a linear combination of some points from the intra-subspace. Based on this assumption, the linear representation coefficient can be used as a measurement of similarity and has achieved state of the art in subspace clustering [11, 12, 13, 19, 24, 15, 25, 26] since it encodes the global structure of the whole data set into similarity.

For given a data matrix , these methods linearly represent and obtain the coefficient matrix in a self-expression manner by solving

(2)

where avoids the trivial solution which uses the data point to represent itself by enforcing the diagonal elements of to be zeros. denotes the adopted prior structured regularization on and the major difference among most existing subspace clustering methods is the choice of . For example, SSC [11] enforces the sparsity on by adopting -norm via , LRR [12] obtains low rankness by using nuclear norm with . To further achieving robustness, (2) is extended as follows:

(3)

where stands for the errors induced by the noise and corruption, measures the impact of the errors. Generally, the and are used to describe the Gaussian noise and Laplacian noise, respectively. denotes the Frobenius norm

Due to the assumption on linear reconstruction, those methods failed to achieve nonlinear subspaces clustering. To address this challenging issue, some recent works have been proposed [15, 24], however, the methods have the following two disadvantages: 1) the methods are computationally inefficient since they involve solving - or nuclear-norm minimization problem; 2) Like SSC and LRR, the methods need the prior on the errors existed in the data sets to get the correct mathematical formulation. If the prior is inconsistent with the real situation, the methods could achieve inferior performance. To solve these issues, we propose a nonlinear subspace clustering method which is complementary to existing approaches. Noticed that, Peng at al. recently proposed to achieve nonlinearity with deep structures [27, 28]

, which are first works to leverage deep learning and subspace clustering. However, this has been beyond of the scope of this paper.

Iii The proposed subspace clustering method

This section will give the details of our proposed method, which consists of three steps: 1) calculating the kernel truncated regression representation over the whole data set. 2) eliminating the effectiveness of possible errors such as noises from the representation and then building a graph Laplacian. 3) obtaining clustering by performing the k-means algorithm on leading eigenvectors of the graph Laplacian. Moreover, we also give the computational complexity of the proposed method.

Iii-a Kernel Truncated Regression Representation

For a given data set , where , we define a matrix . Let : be a nonlinear mapping which transforms the input into a kernel space , and . After mapping into a kernel space, the corresponding is generally believed lying in linear subspaces [15, 24]. Based on this basic idea, we propose to formulate the objective function of our KTRR as follows:

(4)

where the first term is the reconstruction error in the kernel space, the second term serves as an -norm regularization, and is a positive real number, which controls the strength of the -norm regularization term.

For each transformed data representation , solving the optimization problem (4), it gives that

(5)

Note that it requires for solving the above problems of data points with dimensionality of .

To solve (4) more efficiently, we rewrite it as

(6)

where is a column vector with all zero elements except the -th element is , and the constraint eliminates the trivial solution of writing a transformed point as a linear combination of itself.

Using Lagrangian method, we obtain that

(7)

where is the Lagrangian multiplier. Clearly,

(8)

Let , we get

(9)

Multiplying on both sides of (9), and since , it holds that

(10)

Substituting (10) into (9), the optimal solution is given as

(11)

where , and .

One can find that the solution to (11) does not require to be explicitly computed, i.e. we will only need their dot products. Therefore, we can employ kernel functions for computing these dot products without explicitly performing the mapping . For some choices of a kernel : , [29] has shown that can get the dot product in the kernel space induced by the mapping .

We can combine all the dot products as a matrix whose elements are calculated as

(12)

where . The matrix is the kernel matrix, which is a symmetric and positive semidefinite matrix. Accordingly, (11) can be rewritten as

(13)

where , and .

It is notable that only one pseudo-inverse operation is needed for solving the representation problems of all data points. The computational complexity of calculating the optimal solutions in (13) has decreased to for data points with dimensions.

It has been proved that, under certain condition, the coefficients over intra-subspace data points are larger than those over inter-subspace data points [17]. After representing the data set by the kernel matrix via (13), we handle the errors by performing a hard thresholding operator over , where keeps largest entries in and sets other entries as zeros like TRR, i.e.,

(14)

and

(15)

where consists of largest elements of . Typically, the optimal equals to the dimensionality of corresponding kernel subspace. In this manner, it avoids to model the impact of the noises into the optimization problem explicitly and does not need the prior knowledge about the errors.

Iii-B KTRR for Robust Subspace Clustering

In this section, we present the method to achieve subspace clustering by incorporating the KTRR into spectral clustering framework [8].

For a given data set which consists of data points in , we assume that these points should be lying in a union of low-dimensional nonlinear subspaces. We propose to project the data points into another space, in which the mapped points can be linearly represented by these mapped points from the intra-subspace. From (11), we find that the representation coefficients does not require the projection function in explicit form, but are only needed in dot products. We can induce a kernel function to calculate these dot products, and obtain the representation coefficients via (13).

Moreover, the existence of the errors in the input data set leads to some error connections among the data points from different subspaces. We propose to remove these errors through a hard thresholding on each column vector of the coefficient matrix via (14).

As we claimed before, these representation coefficients can be seen as the similarities among the input data points. The similarity between two intra-subspace data points is large, and that between two inter-subspace data points is zero or very close to zero. Such that we can build a similarity matrix based on the obtained coefficient matrix as

(16)

This is a symmetric similarity matrix which is suitable for integrating into the spectral clustering framework.

Then, we compute the normalized Laplacian matrix [8]

(17)

where is a diagonal matrix with . The matrix

is positive semi-definite and has an eigenvalue equals

with eigenvector  [9], where .

Next, we calculate the first eigenvectors of , which corresponding to its first smallest nonzero eigenvalues, and construct a matrix .

Finally, we apply the k-means clustering method on the matrix , by treating each row vector as a point, to get the clustering membership. The proposed subspace clustering algorithm is summarized in Algorithm 1.

0:  A given data set , the tradeoff parameter , thresholding parameter , and the number of subspaces .
0:  The clustering labels of the input data points.
1:  Calculate the kernel matrix and the matrix in (13) and store them.
2:  For each point , calculate its linear representation coefficients in the kernel space via (13).
3:  Remove the trivial coefficients from by performing hard thresholding operator , i.e., keeping largest entries in and zeroing all other elements.
4:  Construct a symmetric similarity matrix via (16).
5:  Calculate the normalised Laplacian matrix via (17).
6:  Compute the eigenvector matrix that consists of the first normalized eigenvectors of corresponding to its smallest nonzero eigenvalues.
7:  Perform k-means clustering algorithm over the rows of to get the clustering membership.
Algorithm 1 Learning kernel truncated regression representation for robust subspace clustering

Iii-C Computational Complexity Analysis

Given a data matrix , the KTRR takes to compute the kernel matrix . Then it takes to obtain the matrix , and to calculate all the solutions in (13) with the matrices and . Finally, it requires to find largest coefficients in each column of the representation matrix . Putting these steps together, we get the computational complexity of KTRR as . This computational complexity is the same as that of TRR, and is considerably less than that of KSSC [30], KLRR[24], where denotes the total number of iterations for the corresponding algorithm, is the rank of , and is the rank for partial SVD at each iteration of KLRR.

Iv Experimental Results and Analysis

In this section, we experimentally evaluate the performance of the proposed method. We consider the results in terms of three aspects: 1) accuracy, 2) robustness, and 3) computational cost. Robustness is evaluated by conducting experiments using samples with two different types of corruptions, i.e., Gaussian noises and random pixel corruption.

Iv-a Databases

Five popular image databases are used in our experiments, including Extended Yale Database B (ExYaleB) [31], Columbia Object Image Library (COIL 20) [32], Columbia Object Image Library (COIL 100) [33], USPS [34], and MNIST [35]. We give the details of these databases as follows:

  • The ExYaleB database contains frontal face images of subjects and around 64 near frontal images under different illuminations per individual, where each image is manually cropped and normalized to the size of pixels [36].

  • The COIL 20 and COIL 100 databases contain 20 and 100 objects respectively. The images of each object were taken degrees apart as the object is rotated on a turntable and each object has images. The size of each image is pixels, with grey levels per pixel [36].

  • The USPS handwritten digit database111

    The USPS database and MNIST database used in this paper are download from

    http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html. includes ten classes ( digit characters) and 11000 samples in total. We use a popular subset contains handwritten digit images for the experiments, and all of these images are normalized to the size of pixels. In the experiment, we select samples of each subject from the database randomly by following the strategy in  [10].

  • The MNIST handwritten digit database includes ten classes ( digit characters) and 60000 samples in total. We use first handwritten digit images of the training subset to conduct the experiments, and all of these images are normalized to the size of pixels. In the experiment, we also select samples of each subject from the database randomly to evaluate the performance of different algorithms.

The details of these real-world databases are summarized in Table II

Dataset Original size Normalised size
ExYaleB 38 58
COIL 20 20 72
COIL 100 100 72
USPS 10 1000
MNIST 10 1000
TABLE II: Details of seven data sets for the experiments. For simplicity, denotes the total number of clusters, and stands for the number of samples in each cluster.

Iv-B Baselines and Evaluation Metrics

We compare KTRR222The source code of our proposed method are available at https://www.dropbox.com/s/8vj1k1b184w2ksv/KTRR.zip?dl=0. with state-of-art subspace clustering algorithms including truncated regression representation (TRR) [17], kernel low-rank representation (KLRR) [24], kernel sparse subspace clustering (KSSC) [30], Latent low-rank representation (LatLRR) [25], low-rank representation (LRR1) with -norm  [12], low-rank representation (LRR2) with -norm  [12], sparse subspace clustering (SSC) [11], sparse manifold clustering and embedding (SMCE) [26], local subspace analysis (LSA) [37], and standard spectral clustering (SC) [8].

For a fair comparison, we use the same spectral clustering framework [8] with different similarity matrices obtained by the tested algorithms. Like [24], for all kernel-based algorithms, we adopt the commonly used Gaussian kernel on all datasets and use the default bandwidth parameter which is set to the mean of the distances between all the samples.

Four popular metrics are adopted to evaluate the subspace clustering quality, i.e., accuracy (AC) [38, 10], normalized mutual information (NMI) [38, 10], the adjusted rand index (ARI) [39], and Fscore [40]. The values of above four metrics are higher if the method works better. The values of these four metrics are equal to indicates the predict results is perfectly matching with the ground truth, whereas indicates totally mismatch.

Iv-C Visualisation of Representation and Similarity Matrices

Before evaluating the clustering performance of the proposed method, we illustrate the visualization results of the KTRR coefficients matrix and the obtained similarity matrix. We get the result by using the first facial images in the ExYaleB database, first samples of which belong to the first subject, and the other samples belong to the second subject. We set the parameters as , . The representation matrix in (13) and the constructed similarity matrix are shown in Fig. 2 and Fig. 2 respectively.

Fig. 2: The visualization of the representation matrix and the similarity matrix on facial images which from first subjects in ExYaleB database. (a) The representation matrix in (13). (b) The similarity matrix obtained by our algorithm. The experiment was carried out on the first two subjects of ExYaleB. The top rows and the right columns illustrate some images of these two subjects. The dotted lines split each matrix into four parts. The upper-left part: the similarity relationship among the images of the first subject. The bottom-right part: the similarity relationship among the images of the second subject. The upper-right part and the bottom-left part: the similarity relationship among the images from different subjects. From the connections, it is easy to find that the upper-left part and the bottom-right part are illuminated from the upper-right part and the bottom-left part, which means that our method reflects the correct relationship among the samples from different subjects.

From Fig. 2, we can see that the upper-left part and the bottom-right part are illuminated from the upper-right part and the bottom-left part, but there still exist some non-zero elements in the upper-right part and the bottom-left part. That is to say, the connections among the same subject are much stronger than that among different subjects, while there are many trivial connections among the samples from different subjects since the samples from various subjects are facial images, which have some common characteristics.

As we know, an ideal similarity matrix for the spectral clustering algorithm is a block diagonal matrix, i.e., the connections should only exist among the data points from the same cluster [26, 12, 17, 15, 24], such that a hard thresholding operation has been executed. From the result of similarity matrix in Fig. 2, we find that:

  • Our method reveals the latent structure of data though these images belong to the two subjects. There exist only a few bright spots in the upper-right part and the bottom-left part of the obtained similarity matrix, i.e., the trivial connections among the samples from different subjects have been mostly removed by using the thresholding processing;

  • Almost all of the bright spots lie in the diagonal blocks of the similarity matrix, i.e., the strong connections exist among the samples from the same subject;

  • The obtained similarity matrix is a symmetric matrix which can be directly used for subspace clustering under the framework of spectral clustering [8].

Iv-D Clustering on Clean Images

In this experiment, we compare the KTRR method with other ten state-of-the-art approaches on four different benchmark databases, i.e., Extended Yale Database B (ExYaleB) [31], Columbia Object Image Library (COIL 20) [32], USPS [34], MNIST [35]. For each dataset, we perform each algorithm runs, in each run the k-means clustering step are repeated

times, and report the mean and the standard deviation of the used metrics. The clustering quality on above four databases are shown in Table 

III - Table VI. The better means for each database are highlighted in boldface. To have statistically sound conclusions, the Wilcoxon’s rank sum test [41] at a significance level is adopted to test the significance of the differences between the results obtained by the proposed method and all other algorithms. From the results, we can obtain the following conclusions.

Metric KTRR TRR KLRR KSSC LatLRR LRR1 LRR2 SSC SMCE LSA SC
AC 67.04+2.93 52.30+4.31 58.41+3.19 51.40+3.36 50.32+2.68 49.80+4.72 52.87+5.46 48.91+3.71 33.97+3.95 19.69+1.70
NMI 72.20+2.61 61.98+2.45 64.41+1.10 54.41+1.76 53.31+1.42 53.26+2.22 58.02+3.44 60.22+1.28 47.38+1.87 32.96+1.54
ARI 41.07+6.91 36.06+3.49 32.40+5.82 27.10+2.26 26.42+2.17 25.63+2.70 24.20+4.74 30.46+3.06 20.98+1.45 10.16+1.01
Fscore 43.01+6.52 37.87+3.37 34.59+5.38 29.33+2.07 28.66+2.00 27.93+2.52 26.83+4.34 32.54+2.84 23.20+1.36 12.56+0.98
TABLE III: Clustering performance (%) comparisons in different methods on the ExYaleB database. The best mean results in different metrics are in bold. The “” indicates that the value of the method is significantly different from all other methods at a level by the Wilcoxon’s rank sum test.
Metric KTRR TRR KLRR KSSC LatLRR LRR1 LRR2 SSC SMCE LSA SC
AC 84.12+3.35 68.66+5.51 79.39+8.15 67.97+3.47 67.94+7.97 66.59+3.35 69.39+5.93 76.51+15.98 72.86+6.67 69.17+3.81
NMI 91.79+0.94 77.93+3.24 89.50+2.73 76.78+1.32 76.45+2.20 75.33+2.50 80.61+2.44 90.51+5.69 81.49+3.69 79.43+2.18
ARI 80.72+3.05 61.96+6.98 76.54+7.13 60.03+2.42 60.03+4.94 58.45+4.21 62.14+5.18 75.20+15.45 68.13+5.92 63.77+3.88
Fscore 81.76+2.80 63.92+6.55 77.81+6.65 62.03+2.30 62.09+4.66 60.57+3.98 64.17+4.80 76.61+14.41 69.74+5.57 65.60+3.68
TABLE IV: Clustering performance (%) comparisons in different methods on the COIL 20 database. The best mean results in different metrics are in bold. The “” indicates that the value of the method is significantly different from all other methods at a level by the Wilcoxon’s rank sum test.
Metric KTRR TRR KLRR KSSC LatLRR LRR1 LRR2 SSC SMCE LSA SC
AC 60.24+16.98 70.72+2.63 75.17+2.89 70.97+4.10 70.16+4.29 70.91+3.96 26.86+13.39 73.77+7.85 68.51+8.95 70.79+8.58
NMI 59.55+7.61 66.23+3.35 73.97+2.41 66.55+4.37 66.69+4.63 66.87+4.35 20.93+13.44 71.29+9.69 64.80+7.93 62.72+4.55
ARI 46.06+13.33 57.21+3.76 65.11+3.67 57.20+4.51 57.18+4.79 57.50+4.48 9.95+13.56 62.08+13.10 55.58+9.00 53.55+5.24
Fscore 51.98+11.29 61.64+3.41 68.76+3.28 61.59+4.11 61.58+4.34 61.85+4.08 24.07+7.62 66.10+11.63 60.30+7.99 58.25+4.67
TABLE V: Clustering performance (%) comparisons in different methods on the USPS handwriting database. The best mean results in different metrics are in bold. The “” indicates that the value of the method is significantly different from all other methods at a level by the Wilcoxon’s rank sum test.
Metric KTRR TRR KLRR KSSC LatLRR LRR1 LRR2 SSC SMCE LSA SC
AC 32.19+16.22 61.31+6.14 57.13+15.57 14.48+8.37 18.08+10.57 18.52+12.34 22.02+20.53 61.66+5.59 63.03+7.46 55.11+5.45
NMI 24.73+15.97 60.07+5.86 59.43+13.06 4.04+7.96 6.76+11.18 7.53+14.97 14.58+24.25 59.08+3.99 61.94+5.78 48.80+5.30
ARI 12.35+13.93 47.46+5.77 45.02+16.11 1.26+4.47 2.80+6.50 3.27+8.64 6.16+16.06 46.81+5.80 49.50+8.19 36.95+5.62
Fscore 24.31+8.95 52.99+4.92 50.99+14.05 18.13+2.74 18.42+4.13 18.43+5.15 21.67+9.34 52.39+5.05 54.78+7.28 43.38+5.01
TABLE VI: Clustering performance (%) comparisons in different methods on the MNIST handwriting database. The best mean results in different metrics are in bold. The “” indicates that the value of the method is significantly different from all other methods at a level by the Wilcoxon’s rank sum test.

(1) Evaluation on the ExYaleB facial database:

  • All linear and non-linear representation methods, i.e., KTRR, TRR [17], KLRR [24], KSSC [15], LRR [12], SSC [11], outperform the standard spectral clustering method [8].

  • All the linear representation methods, i.e., TRR [17], LRR [12], and SSC [11], are inferior to their kernel-based extensions, i.e., KTRR, KLRR [24], and KSSC [15]. It means that the non-linear representation methods are more suitable to model the ExYaleB facial images.

  • The KTRR algorithm achieves the best results in the tests and gains a significant improvement over TRR. The means of , , , and of KTRR are about , , and higher than that of the TRR, , , , and higher than that of the KLRR.

(2) Evaluation on the COIL 20 database:

  • All the linear representation methods, i.e., TRR [17], LRR [12], and SSC [11], are still inferior to their kernel-based extensions, i.e., KTRR, KLRR [24], and KSSC [15]. Their non-linear versions obtain the improvements of , , and , respectively.

  • The KTRR algorithm gets the of , which is better than all other tested methods. Specifically, the of KTRR is about higher than that of the second best method TRR, and higher than that of the third best method KLRR.

  • The KLRR, LatLRR and two types of LRR methods are all inferior to the standard spectral method.

(3) Evaluation on the USPS handwriting database:

  • All the linear representation methods, i.e., TRR [17], LRR [12], and SSC [11], are inferior to their kernel-based extensions, i.e., KTRR, KLRR [24], and KSSC [15]. The performance improvement is considerable, e.g., the of KSSC is about higher than that of SSC.

  • SSC is inferior to LRR, while its kernel-based extension KLRR outperforms the kernel-based extension of LRR. The implicit transformation on the USPS images makes the mapped data points to be much better represented with each other in a sparse representation form.

  • The KTRR algorithm achieves the best results in the tests. The of KTRR is about higher than that of the TRR, higher than that of the KLRR, and higher than that of the KSSC. The performance indices of KTRR on , , and are also greater than other tested methods.

(4) Evaluation on the MNIST handwriting database:

  • All the linear representation methods, i.e., TRR [17], LRR [12], and SSC [11], are inferior to their kernel-based extensions, i.e., KTRR, KLRR [24], and KSSC [15]. Especially, LRR results in poor performance on this database, while its kernel-based version, KLRR, obtains much better clustering quality regarding , , , and .

  • The KTRR, KLRR, SMCE and LSA algorithms achieve the best clustering results on the MNIST handwriting images compared with other methods. However, the performances of all the test methods are not well.

  • The proposed method KTRR achieves the best clustering result and obtains a significant improvement of at on TRR. The indexes , , and of KTRR are also higher than all other tested methods.

Iv-E Clustering on Corrupted Images

To evaluate the robustness of the proposed method, we conduct the experiments on the first subjects of COIL20 database and ExYaleB database respectively. All used images are corrupted by additive white Gaussian noises or random pixel corruptions. Some corrupted image samples under different levels of noises are as shown in Fig. 3. Actually, for the additional Gaussian noises, we add the noises with equals ; For the random pixel corruptions, we adopt the pepper & salt noises with the ratios of affected pixels be .

Fig. 3: (a) The corrupted samples with additional Gaussian noises under equals from left to right. (b) The corrupted samples with pepper and salt noises under the ratio of affected pixels equals from left to right.
Fig. 4: The clustering results on images with different levels of additional Gaussian noises. (a) The clustering accuracy on the ExYaleB database. (b) The clustering accuracy on the COIL 20 database.
Fig. 5: The clustering results on images with different ratios of pepper & salt corruptions. (a) The clustering accuracy on the ExYaleB database. (b) The clustering accuracy on the COIL 20 database.

The clustering quality of the compared methods on the two databases with additional Gaussian noises is shown in Fig. 4, from which we can get the following observations:

  • Most of these spectral-based methods are relatively robust to the additional Gaussian noises. While the performance of LRR1, LRR2, and LatLRR are sharply deteriorated on these two databases. The main reason may be that the additional Gaussian noises have destroyed the underlie the low-rank structure of the representation matrix.

  • The accuracy of all tested methods on COIL20 database are higher than that on ExYaleB database. It is consistent with the result of that on clean images.

  • The proposed KTRR is considerably more robust than other methods for additional Gaussian noises. Specifically, KTRR obtains the around under , which are much higher than all other tested algorithms, especially SC, LSA, LRR1, and LRR2.

The clustering quality of the compared methods on the images with randomly corruptions is shown in Fig. 5, from which we obtain that:

  • All the investigated methods perform not as well as the case with white Gaussian noise. The result is consistent with a widely-accepted conclusion that non-additive corruptions are more challenging than additive ones in pattern recognition;

  • All of the test algorithms perform much better on COIL20 database than on ExYaleB database. The of all algorithms are lower than on ExYaleB under and of corrupted pixels. From the Fig. 3, we find that most pixel values of the images from COIL20 database are close to or . This leads to some of the corruptions to be useless and weakens the impact to the final clustering results.

  • The KTRR algorithm is robust to the random pixel corruptions. It achieves the best results under the ratio of affected pixels equals to on the test two databases. It obtains the around under the ratio of affected pixels equals on COIL20 database, which is a very challenging situation that we can see in Fig. 3. However, the of KTRR drops severely with the increase of the ratios of corrupted pixels, and lower than that of SSC under and of corrupted pixels. The KTRR should be improved to handle the images with salt pepper corruptions.

Iv-F Computational Time

To investigate the efficiency of KTRR, we compare its computational time with that of other approaches on the clean images of four databases. Our hardware configuration comprises of a -GHz CPU and a 16 GB RAM. The time cost for building similarity graph () and the whole time cost for clustering () are recorded to evaluate the efficiency of compared methods.

Databases KTRR TRR KLRR KSSC LatLRR LRR1 LRR2 SSC SMCE LSA SC
ExYaleB t1 22.96 23.71 45.82 5512.68 772.44 248.94 270.65 2301.75 10.15 198.48 0.33
t2 47.62 48.8 71.26 5543.4 806.43 286.91 311.34 2313.31 45.18 229.26 124.45
COIL 20 t1 6.50 6.54 16.11 1466.12 579.01 430.23 454.87 121.76 5.76 61.14 0.15
t2 11.66 12.75 25.55 1472.07 584.46 436.59 460.07 126.88 10.12 66.01 7.61
USPS t1 16.92 11.95 29.41 2752.97 50.05 43.34 49.10 62.25 67.44 108.67 0.14
t2 27.82 22.74 39.93 2763.19 58.98 51.88 58.90 98.79 76.50 120.46 11.73
MNIST t1 22.56 22.43 34.20 5742.89 246.35 155.70 172.32 112.13 16.78 142.71 1.09
t2 33.96 32.35 44.64 5753.42 270.19 167.25 186.82 153.23 25.52 154.64 14.65
TABLE VII: Computational time (seconds) comparisons of different methods on the ExYaleB, COIL 20, USPS, and MNIST databases. The and denote the time cost on the similarity graph construction process and the time cost on the whole clustering process of each method respectively. The best mean results in different metrics are in bold.

Table VII shows the time cost of different methods with the parameters which achieve their best results. We can see that:

  • The standard SC [8] is the fastest since its similarity graph is computing via the pairwise kernel distances among the input samples, while the KSSC [15] is the most time-consuming method.

  • The time cost of the proposed method is very close to that of its linear version TRR [17]. Specifically, the TRR method is faster than KTRR on ExYaleB and USPS databases, while it is slower than KTRR on MNIST database. They have similar time cost on COIL database. The Wilcoxon’s rank sum test [41] at a significance level shows there is no significant difference between the time costs of the KTRR and TRR on the similarity graph construction and the whole clustering process on the tested four databases.

  • The KTRR and TRR [17] algorithms are much faster than KSSC, SSC, KLRR, and LRR methods. The results are consist with the fact that the theocratical computation complexities of KTRR and TRR are much lower than that of KSSC, SSC, KLRR, and LRR methods. The KTRR and TRR [17] algorithms both have analytical solutions, and only one pseudo-inverse operation is required for solving the representation problems of all data points for KTRR and TRR algorithms.

Iv-G Clustering Performance with Varying Number of Subjects

In this subsection, we investigate the clustering performance of the proposed method with a different number of subjects on COIL 100 image database. The experiments are carried out on the first classes of the database, where increases from to with an interval of . The clustering results are shown in Fig. 6.

Fig. 6: The clustering quality of the proposed method on the first subjects of COIL 100 database.

From the results, we can see that:

  • In general, with the number of subjects increase, the clustering performance is decreased since the clustering difficulty is increasing with the number of subjects growth.

  • With increasing number of subjects, the of KTRR is changed slightly, varying from to . The possible reason is that the is robust to the data distribution (increasing subject number) [19].

  • The proposed method obtains satisfactory performance on COIL 100 database. It achieves perfect clustering result for , and gets the satisfactory performance at with , , , and be around , , , and , respectively.

Iv-H Parameter Analysis

The KTRR has two parameters, the tradeoff parameter and the thresholding parameter . The selection of the values of the parameters depends on the data distribution. A bigger is suitable for highly corrupted databases, and corresponds to the dimensionality of the corresponding subspace for the mapped data points.

To evaluate the impact of and , we conduct the experiment on the ExYaleB and COIL20 databases. We set the from to , and from to , the results are shown in Fig. 7 and Fig. 8 .

Fig. 7: Clustering performance of the proposed method on ExYaleB database. (a) Clustering performance of the proposed method versus different values of and . (b) Clustering performance of the proposed method versus different values of , and fix . (c) Clustering performance of the proposed method versus different values of , and fix .
Fig. 8: Clustering performance of the proposed method on COIL20 database. (a) Clustering performance of the proposed method versus different values of and . (b) Clustering performance of the proposed method versus different values of , and fix . (c) Clustering performance of the proposed method versus different values of , and fix .

From the results, we get the following observations:

  • The proposed method achieves the best clustering performance with and as and on ExYaleB database, and and on COIL20 database, respectively.

  • The proposed method can obtain satisfactory performance with from to on ExYaleB database, where the , , , and are more than , , , and , respectively, and with from to on COIL20 database, where the , , , and are more than , , , and . The performance of KTRR is not sensitive to the parameter of , which makes KTRR be suitable for the real applications.

  • The clustering quality with from to on ExYaleB and COIL20 databases are much better than other cases. It means that the thresholding process is helpful to improve the performance of KTRR, and the dimensionality of underlying subspaces of the ExYaleB and COIL20 databases in the hidden space belongs the scope from to .

Iv-I Different Kernel Functions

Function
USPS AC 80.38+19.04 72.31+18.06 81.36+14.93 74.97+12.40 79.62+17.69 73.64+14.46
NMI 76.08+9.75 67.38+14.41 78.04+6.63 75.59+9.22 74.18+13.08 74.58+8.01
ARI 70.10+15.43 59.48+17.11 71.73+12.67 65.98+13.81 68.32+19.86 64.62+14.44
Fscore 73.25+13.45 63.78+15.06 74.69+11.13 69.72+11.83 71.68+17.47 68.53+12.38
MNIST AC 65.58+11.65 66.48+14.54 63.97+8.11 59.21+14.27 61.62+12.43 62.06+9.67
NMI 64.27+6.25 63.49+6.50 66.81+4.43 63.54+12.66 64.00+9.32 65.33+5.59
ARI 51.62+10.21 51.54+11.54 52.63+5.04 48.62+16.10 50.18+11.25 51.20+7.18
Fscore 56.70+8.92 56.61+10.09 57.92+4.16 54.50+13.94 55.63+9.71 56.66+6.13
TABLE VIII: Perfomance comparison of different kernel functions used in the proposed method. The parameter of is setting as the mean of the distances between all the samples. The best mean results in different metrics are in bold.

The commonly used kernel functions are polynomial kernels, radial basis functions, and sigmoid kernels. To investigate the performance of the proposed method using different kernels, we study six different kernel functions. The results on USPS and MINIST databases are shown in Table 

VIII, from which we can get the following observations:

  • The kernel function achieves the best performance on USPS database. While the kernel function obtains the best performance on MNIST database.

  • The kernel function outperforms on USPS database, which is different to that on MNIST database. It is mainly caused by the fact that the images from USPS database lie in much higher nonlinear subspaces than that from MNIST database, and the former function induced a much more nonlinear mapping.

  • The selection of different kernels results in a great difference in the subspace clustering performance both on USPS database and MNIST database.

V Conclusion

In this paper, we have incorporated the kernel technique into TRR method to achieve robust nonlinear subspace clustering. It does not need the prior knowledge about the structure of errors in the input data and remedies the drawback of the existing TRR method that it cannot deal with the data points from nonlinear subspaces. Moreover, through the theoretical analysis of our proposed mathematical model, we find that the developed optimization problem can be solved analytically, and the closed-form solution is only dependent on the kernel matrix. Theses advantages make our proposed method useful in many real-world applications. Comprehensive experiments on four real-world image databases have demonstrated the effectiveness and the efficiency of the proposed method.

In the future, we plan to conduct a systematical investigation on the selection of optimal kernel for our proposed method and study how to determine the number of nonlinear subspaces automatically.

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under grants 61432012, 61329302, and the Engineering and Physical Sciences Research Council (EPSRC) of U.K. under grant EP/J017515/1.

References

  • [1] L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: a review,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 90–105, 2004.
  • [2] P. S. Bradley and O. L. Mangasarian, “k-plane clustering,” Journal of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000.
  • [3] S. R. Rao, R. Tron, R. Vidal, and Y. Ma, “Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on.    IEEE, 2008, pp. 1–8.
  • [4] H. Derksen, Y. Ma, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy coding and compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, p. 15461562, 2007.
  • [5] J. P. Costeira and T. Kanade, “A multibody factorization method for independently moving objects,” International Journal of Computer Vision, vol. 29, no. 3, pp. 159–179, 1998.
  • [6] C. W. Gear, “Multibody grouping from motion images,” International Journal of Computer Vision, vol. 29, no. 2, pp. 133–150, 1998.
  • [7]

    R. Vidal, Y. Ma, and S. Sastry, “Generalized principal component analysis,”

    Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 12, pp. 1945–1959, 2005.
  • [8]

    A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in

    Advances in Neural Information Processing Systems, vol. 14, 2002, Conference Proceedings, pp. 849–856.
  • [9] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
  • [10] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with -graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, April 2010.
  • [11] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2765–2781, 2013.
  • [12] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
  • [13] C. Y. Lu, H. Min, Z. Q. Zhao, L. Zhu, D. S. Huang, and S. C. Yan, “Robust and efficient subspace segmentation via least squares regression,” in Proc. of 12th Eur. Conf. Comput. Vis., Florence, Italy, Oct. 2012, pp. 347–360.
  • [14] C. Lu, J. Tang, M. Lin, L. Lin, S. Yan, and Z. Lin, “Correntropy induced l2 graph for robust subspace clustering,” in 2013 IEEE International Conference on Computer Vision, Dec 2013, pp. 1801–1808.
  • [15] V. M. Patel and R. Vidal, “Kernel sparse subspace clustering,” in Image Processing (ICIP), 2014 IEEE International Conference on.    IEEE, 2014, Conference Proceedings, pp. 2849–2853.
  • [16] L. Zhen, Z. Yi, X. Peng, and D. Peng, “Locally linear representation for image clustering,” Electronics Letters, vol. 50, no. 13, pp. 942–943, 2014.
  • [17]

    X. Peng, Z. Yi, and H. Tang, “Robust subspace clustering via thresholding ridge regression,” in

    The Twenty-Ninth AAAI Conference on Artificial Intelligence

    , 2015, Conference Proceedings, pp. 3827–3833.
  • [18] H. Liu, T. Liu, J. Wu, D. Tao, and Y. Fu, “Spectral ensemble clustering,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.    ACM, 2015, pp. 715–724.
  • [19] X. Peng, H. Tang, L. Zhang, Z. Yi, and S. Xiao, “A unified framework for representation-based subspace clustering of out-of-sample and large-scale data,”

    IEEE Transactions on Neural Networks and Learning Systems

    , vol. 27, no. 12, pp. 2499–2512, Dec 2016.
  • [20] X. Peng, C. Lu, Y. Zhang, and H. Tang, “Connections between nuclear norm and frobenius norm based representation,” IEEE Transactions on Neural Networks and Learning Systems, vol. PP, no. 99, pp. 1–7, 2016.
  • [21] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, Conference Proceedings, pp. 2790–2797.
  • [22] S. R. Rao, R. Tron, R. Vidal, and Y. Ma, “Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on.    IEEE, 2008, pp. 1–8.
  • [23] Z. Yu, H.-S. Wong, and H. Wang, “Graph-based consensus clustering for class discovery from gene expression data,” Bioinformatics, vol. 23, no. 21, pp. 2888–2896, 2007.
  • [24] S. Xiao, M. Tan, D. Xu, and Z. Y. Dong, “Robust kernel low-rank representation,” IEEE Transactions on Neural Networks and Learning Systems, 2015.
  • [25]

    G. Liu and S. Yan, “Latent low-rank representation for subspace segmentation and feature extraction,” in

    2011 International Conference on Computer Vision, Nov 2011, pp. 1615–1622.
  • [26] E. Elhamifar and R. Vidal, “Sparse manifold clustering and embedding,” in Advances in Neural Information Processing Systems, 2011, Conference Proceedings, pp. 55–63.
  • [27] X. Peng, S. Xiao, J. Feng, W. Yau, and Z. Yi, “Deep subspace clustering with sparsity prior,” in Proceedings of the 25 International Joint Conference on Artificial Intelligence, New York, NY, USA, 9-15 July 2016, pp. 1925–1931. [Online]. Available: http://www.ijcai.org/Abstract/16/275
  • [28] X. Peng, J. Feng, J. Lu, W.-Y. Yau, and Z. Yi, “Cascade subspace clustering,” in Proceedings of the 31th AAAI Conference on Artificial Intelligence.    SFO, USA: AAAI, Feb. 2017, pp. 2478–2484.
  • [29] J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis.    Cambridge university press, 2004.
  • [30] V. M. Patel and R. Vidal, “Kernel sparse subspace clustering,” in 2014 IEEE International Conference on Image Processing (ICIP).    IEEE, 2014, pp. 2849–2853.
  • [31]

    K.-C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces for face recognition under variable lighting,”

    Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 5, pp. 684–698, 2005.
  • [32] S. K. N. S. A. Nene and H. Murase, “Columbia object image library (coil-20),” Report, 1996.
  • [33] ——, “Columbia object image library (coil-100),” Report, 1996.
  • [34] J. J. Hull, “A database for handwritten text recognition research,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, no. 5, pp. 550–554, 1994.
  • [35] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [36] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1548–1560, 2011.
  • [37] J. Yan and M. Pollefeys, “A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate,” in European conference on computer vision.    Springer, 2006, pp. 94–106.
  • [38] X. Zheng, D. Cai, X. He, W.-Y. Ma, and X. Lin, “Locality preserving clustering for image database,” in Proceedings of the 12th annual ACM international conference on Multimedia.    ACM, Conference Proceedings, pp. 885–891.
  • [39] L. Hubert and P. Arabie, “Comparing partitions,” Journal of classification, vol. 2, no. 1, pp. 193–218, 1985.
  • [40] C. Goutte and E. Gaussier,

    A probabilistic interpretation of precision, recall and F-score, with implication for evaluation

    .    Springer, 2005, pp. 345–359.
  • [41] J. D. Gibbons and S. Chakraborti, Nonparametric statistical inference.    Springer, 2011.