1 Introduction
Despite the majority of subspace clustering methods [23] show good performance in various applications, the similarity among data points is measured in the original data domain. Specifically, this similarity is often measured between the sparse [5] or lowrank representations [14] of data points , by exploiting the selfexpressive
property of the data, in terms of Euclidean alike distance. In general, the representation for each data point is its linear regression coefficient on the rest of the data subject to sparsity or lowrank constraint. Unfortunately, this assumption may not be always true for many highdimensional data in real world where data may be better modeled by nonlinear manifolds
[9, 11, 16]. In this case, the selfexpressive based algorithms, such as Sparse Subspace Clustering (SSC) [6] and Low Rank Subspace Clustering (LRSC) [24], are no longer applicable. For example, the human facial images are regarded as samples from a nonlinear submanifold [12, 17].Recently, a useful image and video descriptor, the covariance descriptor which is a symmetric positive definite (SPD) matrix[21], has attracted a lot of attention. By using this descriptor, a promising classification performance can be achieved [9]. However, the traditional subspace learning mainly focuses on the problem associated with vectorvalued data. It is known that SPD matrices form a Lie group, a well structured Riemannian manifold. The naive way of vectorizing SPD matrices first and applying any of the available vectorbased techniques usually makes the task less intuitive and short of proper interpretation. The underlying reason is the lack of vector space structures in Riemannian manifold. That is, the direct application of linear reconstruction model for this type of data will result in inaccurate representation and hence compromised performance.
To handle this special case, a few solutions have been recently proposed to address sparse coding problems on Riemannian manifolds, such as [3, 10, 19]. While for subspace clustering, a nonlinear LRR model is proposed to extend the traditional LRR from Euclidean space to Stiefel manifold [29], SPD manifold [7] and abstract Grassmann manifold [26] respectively. The common strategy used in the above research is to make the data selfexpressive principle work by using the logarithm mapping “projecting” data onto the tangent space at each data point where a “normal” Euclidean linear reconstruction of a given sample is well defined [22]. This idea was first explored by Ho et al. [10] and they proposed a nonlinear generalization of sparse coding to handle the nonlinearity of Riemannian manifolds, by flattening the manifold using a fixed tangent space.
For the special Riemannian manifold of SPD, many researchers [3, 19] took advantage of a nice property of this manifold, namely that the manifold is closed under positive linear combination, and exploited appropriate nonlinear metrics such as logdeterminant divergence to measure errors in the sparse model formulation. This type of strategies fall in the second category of methods dealing with problems on manifolds by embedding manifold onto a larger flatten Euclidean space. For example, the LRR model on abstract Grassmann manifold is proposed based on the embedding technique [27] and its kernelization [25].
It is noteworthy that a nonlinear extension of SSC method for manifold clustering has been proposed in [17]. Unfortunately, the authors simply used the kernel trick to map data onto a highdimensional feature space for vector data, not considering the intrinsic geometric structure. To rectify this, in this paper, we propose to use a kernel feature mapping to embed the SPD Riemannian manifold into a high dimension feature space and preserve its intrinsic Riemannian geometry within data. We call this method kernel sparse subspace clustering on Riemannian manifold (KSSCR). An overview of the proposed method is illustrated in Fig. 1. Different from the work in [17], our motivation is to map SPD matrices into Reproducing Kernel Hilbert Space (RKHS) with LogEuclidean Gaussian kernel based on Riemannian metric. As a result, the linear reconstruction can be naturally implemented in the feature space associated with the kernel where the original SSC can be applied. The proposed method not only effectively characterizes the geodesic distance between pair of SPD matrices but also uncovers the underlying lowdimensional subspace structure.
The remainder of the paper is organized as follows. In Section 2, we give a brief review on related work. Section 3 is dedicated to introducing the novel kernel sparse subspace clustering on Riemannian manifold. The experimental results are given in Section 4 and Section 5 concludes the paper with a brief summary.
2 Related Work
Before introducing the proposed model, in this section, we briefly review the recent development of subspace clustering methods [6] and the analysis of Riemannian geometry of SPD Manifold [18]. Throughout the paper, capital letters denote matrices (e.g., ) and bold lowercase letters denote column vectors (e.g., x). is the th element of vector x. Similarly, denotes the th entry of matrix . and are the and norms respectively, where is the transpose operation. is the matrix Frobenius norm defined as . The space of SPD matrices is denoted by . The tangent space at a point on is defined by , which is a vector space including the tangent vectors of all possible curves passing over .
2.1 Subspace Sparse Representation
Sparse representation, which has been proved to be a useful tool for representing and compressing highdimensional signals, provides a statistical model for finding efficient and compact signal representations. Among them, SSC [6] is of particular interests as it clusters data points to a union of lowdimensional subspaces without referring to a library. Let be a matrix of data. Each datum is drawn from a lowdimensional subspace denoted by for . By exploiting the selfexpressive property of the data, the formulation of SSC is written as follows,
(1) 
where is the coefficient matrix whose column ^{1}^{1}1Matrix is bold, while other matrices are not.is the sparse representation vector corresponding to the th data point. denotes the reconstruction error components.
As to the coefficient matrix , besides the interpretation as new sparse representation of the data, each element in can also be regarded as a similarity measure between the data pair and . In this sense, C
is sometimes called an affinity matrix. Therefore, a clustering algorithm such as Kmeans can be subsequently applied to
C for the final segmentation solution. This is a common practice of subspace clustering based on finding new representation.2.2 Spare Representation on SPD matrices
Since SPD matrices belong to a Lie group which is a Riemannian manifold [1], it cripples many methods that rely on linear reconstruction. Generally, there are two methods to deal with the nonlinearity of Riemannian manifolds. One is to locally flatten the manifold to tangent spaces[22]. The underlying idea is to exploit the geometry of the manifold directly. The other is to map the data into a feature space usually a Hilbert space [11]. Precisely, it is to project the data into RKHS through kernel mapping [8]. Both of these methods are seeking a transformation so that the linearity reemerges.
A typical example of the former method is the one in [10]. Let be a SPD matrix and hence a point on . is a dictionary. An optimization problem for sparse coding of on a manifold is formulated as follows
(2) 
where denotes Log map from SPD manifold to a tangent space at , is the sparse vector and is the norm associated with . Because , the second term in Eq.(2) is essentially the error of linearly reconstructing by others on the tangent space of , As this tangent space is a vector space, this reconstruction is well defined. As a result, the traditional sparse representation model can be performed on Riemannian manifold.
However, it turns out that quantifying the reconstruction error is not at all straightforward. Although norm is commonly used in the Euclidean space, using Riemannian metrics would be better in since they can accurately measure the intrinsic distance between SPD matrices. In fact, a natural way to measure closeness of data on a Riemannian manifold is geodesics, i.e. curves analogous to straight lines in
. For any two data points on a manifold, geodesic distance is the length of the shortest curve on the manifold connecting them. For this reason, the affine invariant Riemannian metric (AIRM) is probably the most popular Riemannian metric defined as follows
[18]. Given , the AIRM of two tangent vectors is defined asThe geodesic distance between points induced from AIRM is then
(3) 
3 Kernel Subspace Clustering on SPD Matrices
Motivated by the above issues, in this section, we propose a novel kernel sparse subspace clustering algorithm which enables SSC to handle data on Riemannian manifold by incorporating the intrinsic geometry of the manifold. The idea is quite simple but effective: map the data points into RKHS first and then perform SSC with some modifications in RKHS. Compared with the original SSC, the advantages of this approach include simpler solutions and better representation due to the capability of learning the underlying nonlinear structures. The following is the detail of our method. Given a data set on SPD manifold, we seek its sparse representation via exploiting the selfexpressive property of the data. Thus, the objective of our kernel sparse subspace representation algorithm on Riemannian manifold is formulated as follows
(4) 
where denotes a feature mapping function that projects SPD matrices into RKHS such that where is a positive definite (PD) kernel.
3.1 LogEuclidean Kernels for SPD Matrices
Although locally flattening Riemannian manifolds via tangent spaces [10] can handle their nonlinearity, it inevitably leads to very demanding computation due to switching back and forth between tangent spaces and the manifold. Furthermore, linear reconstruction of SPD matrices is not as natural as in Euclidean space and this may incur errors.
The recent work in [9] shows the property of Stein divergence is akin to AIRM. Furthermore, a PD kernel can be induced from Stein divergence under some conditions [20]. Concretely, a Stein metric[20], also known as JensenBregman LogDet divergence (JBLD) [4], derived from Bregman matrix divergence is given by,
where denotes determinant. Accordingly a kernel function based on Stein divergence for SPD can be defined as , though it is guaranteed to be positive definite only when or [20].
However, the problem is that Stein divergence is only an approximation to Riemannian metric and cannot be a PD kernel without more restricted conditions [12]. If one uses this kernel, the reconstruction error for Riemannian metric will be incurred. To address this problem, a family of LogEuclidean kernels were proposed in [12], i.e., a polynomial kernel, an exponential kernel and a Gaussian kernel, tailored to model data geometry more accurately. These LogEuclidean kernels were proven to well characterize the true geodesic distance between SPD matrices, especially the LogEuclidean Gaussian kernel
which is a PD kernel for any . Owing to its superiority, we select LogEuclidean Gaussian kernel to transform the SPD matrices into RKHS.
3.2 Optimization
In this subsection, we solve the objective of kernel sparse subspace learning in (4) via alternating direction method of multipliers (ADMM) [2]. Expanding the Frobenius norm in (4) and applying kernel trick leads to the following equivalent problem
(5)  
where is the kernel Gram matrix, and .
By introducing an auxiliary matrix , the above problem can be rewritten as follows
(6) 
The augmented Lagrangian is then
(7) 
on which ADMM is carried out. Note that is a Lagrangian multiplier matrix with compatible dimension.
The optimization algorithm is detailed as follows.

Update .
(8) Let , the subproblem can be formulated by,
(9) Setting the derivative w.r.t. to zero results in a closedform solution to subproblem (8) given by
(10) where
is an identity matrix.

Update C.
(11) The above subproblem has the following closedform solution given by shrinkage operator
(12) (13) where is a shrinkage operator acting on each element of the given matrix, and is defined as .

Update .
(14)
These steps are repeated until .
3.3 Subspace Clustering
As discussed earlier, C is actually a new representation of data learned found by using data selfexpressive property. After solving problem (4), the next step is to segment C
to find the final subspace clusters. Here we apply a spectral clustering method to the affinity matrix given by
to separate the data into clusters, which is equivalent to subspaces. The complete clustering method is outlined in Algorithm 1.
Compute the affinity matrix by,

Apply normalized spectral clustering method to to obtain the final clustering solution .
3.4 Complexity Analysis and Convergence
The computational cost of the proposed algorithm is mainly determined by the steps in ADMM. The total complexity of KSSCR is, as a function of the number of data points, where is the total number of iterations. The soft thresholding to update the sparse matrix C in each step is relatively cheaper, much less than . For updating we can precompute the Cholesky decomposition of at the cost of less than , then compute new by using (10) which has a complexity of in general.
4 Experimental Results
In this section, we present several experimental results to demonstrate the effectiveness of KSSCR. To comprehensively evaluate the performance of KSSCR, we tested it on texture images and human faces. Some sample images of test databases are shown in Figure 2. The clustering results are shown in Section 4.1 and Section 4.2 respectively. All test data lie on Riemannian manifold.
In this work, we adopt two criteria, i.e. Normalized Mutual Information (NMI) and subspace clustering accuracy, to quantify the clustering performance more precisely. NMI aims to measure how similar different cluster sets are. Meanwhile, to extensively assess how the proposed algorithm improves the performance of data clustering, the following four stateoftheart subspace clustering methods are compared against:

Sparse Subspace Clustering(SSC) [6].

Lowrank Representation(LRR) [15].

LowRank Subspace Clustering (LRSC) [24], which aims to seek a lowrank representation by decomposing the corrupted data matrix as the sum of a clean and selfexpressive dictionary.

Kernel SSC on Euclidean space (KSSCE) [17], which embeds data onto to a nonlinear manifold by using the kernel trick and then apply SSC based on Euclidean metric.
We also included Kmeans clustering algorithm (Kmeans) as a baseline.
4.1 Texture Clustering
In this subsection, a subset of the Brodatz database [12], i.e., 16texture (‘16c’) mosaic, was chosen for clustering performance evaluation. There are 16 objects in this subset in which each class contains only one image. Before clustering, we downsampled each image to 256 256 and then split into 64 regions of size 32 32. To obtain their region covariance matrices (RCM), a feature vector for any pixel was extracted, e.g., . Then, each region can be characterized by a 5 5 covariance descriptor. Totally, 1024 RCM were collected. We randomly chose some data from Brodatz database with the number of clusters, i.e., , ranging from 2 to 16. The final performance scores were computed by averaging the scores from 20 trials. The detailed clustering results are summarized in Tables 1 and 2. We set the parameters as and for KSSCR. Also, the tuned parameters are reported for the results achieved by other methods. The bold numbers highlight the best results.
From the tables, we observe that the proposed method outperforms other methods in most cases while KSSCE achieved the second best performance. This is due to the nonlinear subspace clustering in Euclidean space without using Riemannian metric. In addition, we find SSC type methods can better discover the intrinsic structure than LRR type ones. In order to show the underlying lowdimensional structure within data, we provide a visual comparison of affinity matrices obtained by different methods in Figure 3. Due to the fact that LRSC cannot recover the subspace effectively, we exclude its affinity matrix. It is clear that the affinity matrix achieved by KSSCR effectively reflects the structure of data so as to benefit the subsequent clustering task.
2  4  8  10  12  16  
Kmeans  73.440  63.28 0  76.95 0  61.88 0  59.24 0  60.450 
SSC (100)  1000  79.300  78.035.18  61.770.86  68.273.26  60.271.15 
LRSC(1, 0.01)  53.900  0  22.630.30  21.880.50  16.87 0.33  15.950.46 
LRR(0.4)  71.880  42.580  52.171.66  53.520.22  50.770.81  48.341.84 
KSSCE(0.4)  78.91 0  53.52 0  77.64 0.93  72.36 0.77  82.810.65  64.93 2.89 
KSSCR  1000  85.940  97.850  74.640.08  75.70.04  83.620.14 
2  4  8  10  12  16  
Kmeans  28.70 0  45.49 0  75.76 0  63.76 0  62.27 0  63.88 0 
SSC(100)  1000  63.840  80.212.60  64.17 0.30  70.651.93  67.030.58 
LRSC(1, 0.01)  0.44 0  2.080  10.860.4  14.17 0.26  10.23 0.43  15.37 0.43 
LRR(0.4)  21.240  15.15 0  50.58 1.65  48.32 0.36  54.21 0.62  55.14 0.89 
KSSCE(0.4)  28.65 0  35.95 0  73.52 0.53  70.35 0.63  79.420.08  66.35 2.06 
KSSCR  1000  73.620  95.260  76.710.07  79.050.04  82.380.11 
4.2 Face Clustering
In this test, we used the “b” subset of FERET database which consists of 1400 images with the size of 80 80 from 200 subjects (about 7 images each subject). The images of each individuals were taken under different expression and illumination conditions, marked by ‘ba’, ‘bd’, ‘be’, ‘bf’, ‘bg’, ‘bj’, and ‘bk’. To represent a facial image, similar to the work [28], we created a 4343 region covariance matrix, i.e., a specific SPD matrix, which is summarised over intensity value, spatial coordinates, 40 Gabor filters at 8 orientations and 5 scales.
We first tested KSSCR on seven subsets from FERET database which randomly covers some different clusters. The clustering results of the tested approaches are shown in Figure 4 and 5 with varying number of clusters. From Figure 4, we observe that the clustering accuracy of the proposed approach is better than other stateoftheart methods in most cases, peaking on 30cluster subset with , , and . On average, the clustering rate on all seven subsets are 44.49, 49.33, 78.56, 47.54, 72.50, and 80.42 for Kmeans, SSC, LRSC, LRR, KSSCE, and KSSCR, respectively. In Figure 5, the NMI is presented to show the performance of different methods. The average scores are 66.65, 68.31, 88.43, 69.08, 83.01, and 88.46 for Kmeans, SSC, LRSC, LRR, KSSCE, and KSSCR, respectively. As can be seen, KSSCR achieves the comparable results and a little bit better than that of LRSC. While compared to KSSCE, KSSCR is leading by a large margin. This further verifies the advantage of fully considering the intrinsic geometry structure within data.
Next, we tested the effect of parameter in LogEuclidean Gaussian kernel by fixing the number of clusters to 30. Fig. 6 shows the clustering performance versus parameter on the FERET dataset. From the figure, we can see that the clustering score increases as gets larger, reaching peak at about and decreasing afterwards. This helps to determine the value of parameter in the clustering experiments.
5 Conclusion
In this paper, we proposed a novel algorithm called kernel sparse subspace clustering (KSSCR) for SPD matrices, a special type of data that cripples original sparse representation methods based on linear reconstruction. By using an appropriate LogEuclidean kernel, we exploited the selfexpressive property to obtain sparse representation of the original data mapped into kernel reproducing Hilbert space based on Riemannian metric. Experimental results show that the proposed method can provide better clustering solutions than the stateoftheart approaches thanks to incorporating Riemannian geometry structure.
Acknowledgement
The funding information is hid for the purpose of the anonymous review process.
References
 [1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric means in a novel vector space structure on symmetric positivedefinite matrices. SIAM Journal on Matrix Analysis and Applications, 29(1):328–347, 2007.

[2]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.
Distributed Optimization and Statistical Learning via the
Alternating Direction Method of Multipliers, volume 3.
Foundations and Trends in Machine Learning, 2011.
 [3] A. Cherian and S. Sra. Riemannian sparse coding for positive definite matrices. In Proceedings of ECCV, volume 8691, pages 299–314. Springer International Publishing, 2014.
 [4] A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Jensenbregman logdet divergence with application to efficient similarity search for covariance matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9):2161–2174, Sept 2013.
 [5] D. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory, 52(1):6–18, 2006.
 [6] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765–2781, 2013.
 [7] Y. Fu, J. Gao, X. Hong, and D. Tien. Low rank representation on Riemannian manifold of symmetric positive deffinite matrices. In Proceedings of SDM, 2015, DOI:10.1137/1.9781611974010.36.
 [8] M. Harandi, C. Sanderson, R. Hartley, and B. Lovell. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Proceedings of ECCV, pages 216–229, 2012.
 [9] M. T. Harandi, R. Hartley, B. C. Lovell, and C. Sanderson. Sparse coding on symmetric positive definite manifolds using bregman divergences. IEEE Transactions on Neural Networks and Learning Systems, 2015, DOI:10.1109/TNNLS.2014.2387383.
 [10] J. Ho, Y. Xie, and B. C. Vemuri. On a nonlinear generalization of sparse coding and dictionary learning. In Proceedings of ICML, volume 28, pages 1480–1488, 2013.
 [11] S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. T. Harandi. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In Proceedings of CVPR, pages 73–80, June 2013.
 [12] P. Li, Q. Wang, W. Zuo, and L. Zhang. LogEuclidean kernels for sparse representation and dictionary learning. In Proceedings of ICCV, pages 1601–1608, Dec 2013.
 [13] Z. Lin, R. Liu, and H. Li. Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Machine Learning, 99(2):287–325, 2015.
 [14] G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma. Robust recovery of subspace structures by lowrank representation. IEEE Transactions on Pattern Analysis and Machince Intelligence, 35(1):171 – 184, Jan. 2013.
 [15] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by lowrank representation. In Proceedings of ICML, pages 663–670, 2010.

[16]
H. Nguyen, W. Yang, F. Shen, and C. Sun.
Kernel lowrank representation for face recognition.
Neurocomputing, 155:32–42, 2015.  [17] V. M. Patel and R. Vidal. Kernel sparse subspace clustering. In Proceedings of ICIP, pages 2849–2853, Oct 2014.

[18]
X. Pennec, P. Fillard, and N. Ayache.
A Riemannian framework for tensor computing.
International Journal Of Computer Vision, 66:41–66, 2006.  [19] R. Sivalingam, D. Boley, V. Morellas, and N. Papanikolopoulos. Tensor sparse coding for positive definite matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):592–605, 2014.
 [20] S. Sra. A new metric on the manifold of kernel matrices with application to matrix geometric means. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Proceedings of NIPS, pages 144–152, 2012.
 [21] O. Tuzel, F. Porikli, and P. Meer. Region Covariance: A Fast Descriptor for Detection And Classification. In Proceedings of ECCV, pages 589–600, 2006.
 [22] O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection via classification on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10):1713–1727, Oct 2008.
 [23] R. Vidal. Subspace clustering. IEEE Signal Processing Magazine, 28(2):52–68, 2011.
 [24] R. Vidal and P. Favaro. Low rank subspace clustering (LRSC). Pattern Recognition Letters, 43(1):47–61, 2014.
 [25] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin. Kernelized low rank representation on Grassmann manifolds. arXiv preprint arXiv:1504.01806.
 [26] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin. Low rank representation on Grassmann manifolds: An extrinsic perspective. arXiv preprint arXiv:1504.01807.
 [27] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin. Low rank representation on Grassmann manifold. In Proceedings of ACCV, pages 81–96, Singapore, November 2014.
 [28] M. Yang, L. Zhang, S. C. Shiu, and D. Zhang. Gabor feature based robust representation and classification for face recognition with Gabor occlusion dictionary. Pattern Recognition, 46(7):1865 – 1878, 2013.
 [29] M. Yin, J. Gao, and Y. Guo. Nonlinear lowrank representation on Stiefel manifolds. Electronics Letters, 51(10):749–751, 2015.
Comments
There are no comments yet.