I Introduction
With the ability to disentangle latent structure of data in an unsupervised manner [2, 3, 4]
, subspace clustering is regarded as an important technique in the data mining community and for various computer vision applications
[5, 6, 7, 8]. Traditional subspace clustering methods approximate a set of highdimensional data samples into a union of lowerdimensional linear subspaces
[9], where each subspace usually contains a subset of the samples.In recent years, spectral clustering based methods have achieved stateoftheart performance, taking a twostep framework as follows. First, by optimizing a selfrepresentation problem [4, 10], a similarity matrix (and also a similarity graph) is constructed to depict the relationship (or connection) among samples. Second, spectral clustering [11] is employed for calculating the final assignment based on eigendecomposition of the affinity graph. Note that in practice, both the number of subspaces and their dimensionalities are always unknown [4, 12]. Hence, the goals of subspace clustering include finding the appropriate number of clusters and grouping data points to them [13, 14].
Nevertheless, it is challenging to estimate the number of clusters in a unified optimization framework, since the definition of clusters is subjective, especially in the highdimensional ambient space [13]. Also, for samples which are of different clusters but near the intersection of two subspaces, they may be even closer than samples from same cluster. This may lead to a wrong estimation with several redundant clusters, namely oversegmentation problem. Therefore, most of the spectral based subspace clustering algorithms depend on a manually given and fixed number of clusters, which cannot be generalized for multiple applications [4].
Most clustering schemes group similar patterns into the same cluster by jointly minimizing the intercluster similarity and the intracluster dissimilarity [15]. Considering the complexity of the highdimensional ambient data space, an effective way for estimating the number of clusters is to first map the raw samples into an intrinsic correlation space, namely similarity matrix, followed by an iterative optimization according to the local and global similarity relationships derived from the projection. Elhamifar et al. [16] propose that the permuted similarity matrix can be blockdiagonal, where the number of blocks is identical to the number of clusters. Moreover, Peng et al. [3] verify the intrasubspace projection dominance (IPD) of such a similarity matrix, which can be applied to selfrepresentation optimizations with various kinds of regularizations. The IPD theory says that for two arbitrary samples from the same subspace and one from another, the generated similarities between the former samples are always larger than between latter ones in a noisefree system.
Accordingly, considering an affinity graph derived from the similarity matrix [17, 18], where the vertices denote data samples and the edge weights denote similarities, an automatic subgraph segmentation can be greedily conducted via the following two steps inspired by the density based algorithms [19]: 1) constructing a proper number of initialized cluster centers by minimizing the weighted sum of all intercluster connections and maximizing the intracluster ones; 2) merging the remaining samples to an existing cluster by maximizing the weighted connections between the sample and the cluster.
Yet, there are also difficulties for these greedy iterative schemes. Since the ambient space can also be very dense [20], any two points which are close by evaluating the pairwise distance may not belong to the same subspace, especially for samples near the intersection of two subspaces. Consequently, a hypergraph where each edge can be connected to more than two samples [21, 22] is proposed to solve the problem in traditional pairwise graphs. In this paper, we further introduce a novel data structure termed as the triplet relationship, to explore the local geometry structure in projected space with hypercorrelations. Each triplet consists of three points and their correlations, which are considered as a metaelement for clustering. We require that all correlations are large enough, which indicates that the three points are strongly connected according to the IPD property [3].
In contrast to evaluating similarities using pairwise distances, the proposed triplet relationship demonstrates favorable performance due to the following two reasons. On one hand, it is more robust when partitioning the samples near the intersection of two subspaces since the mutual relevance among multiple samples can provide complementary information when calculating the local segmentation. On the other hand, the triplet evokes a hypersimilarity by efficiently counting the frequency of intratriplet samples, which enables a greedy way to calculate the assignments.
Based on the newly defined triplet, in this paper, we further propose a unified framework termed as the autoSC to jointly estimate the number of clusters and group the samples by exploring the local density derived from triplet relationships. Specifically, we first calculate the selfrepresentation for each sample via an offtheshelf optimization scheme, followed by extracting the triplet relationship for all samples. Then, we greedily initialize a proper number of clusters via optimizing a new model selection reward, which is achieved by maximizing intercluster dissimilarity among triplets. Finally, we merge each of the remaining samples into an existing cluster to maximize the intracluster similarity by optimizing a new fusion reward. We also fuse groups to avoid oversegmentation.
The main contributions of this paper are summarized as follows:

First, we define a hyperdimensional triplet relationship which ensures a high relevance and density among three samples to reflect their local similarity. We also validate the effectiveness of triplets and distinguish them against the standard pairwise relation.

Second, we design a unified framework, i.e., autoSC, based on the intrinsic geometrical structures depicted by our triplet relationships. The proposed autoSC can be used for simultaneous estimating the number of clusters and subspace clustering in a greedy way.
Extensive experiments on benchmark datasets indicate that our autoSC outperforms the stateoftheart methods in both effectiveness and efficiency.
This paper is an extended version of our earlier conference paper [1], to which we enrich the contributions in the following five aspects: (1) We add detailed analysis of the proposed algorithm to distinguish it from comparative methods, for example, we add analysis and experimental validation on the computational complexity. (2) We provide a visualized illustration of the proposed autoSC for clearer presentation. (3) We propose a relaxation termed as the neighboring based autoSC (autoSCN), which directly calculates the neighborhood relationship from raw data space and is more efficient than autoSC. (4) We conduct experiments on evaluating the influence of the parameter (number of preserved neighbors for each sample). (5) We experimentally evaluate our method on realworld application, i.e., motion segmentation, which also demonstrates the benefits of the proposed method.
Ii Related Work
Automatically approximating samples in highdimensional ambient space by a union of lowdimensional linear subspaces is considered to be a crucial task in computer vision [9, 23, 24, 25, 26]. In this section, we review the related contributions in the following three aspects, i.e., selfrepresentation calculation, estimating the number of clusters and hypergraph clustering.
Iia Calculating SelfRepresentation
To separate a collection of data samples which are drawn from a highdimensional space according to the latent lowdimensional structure, traditional selfexpressiveness based subspace clustering method calculates a linear representation for each sample using the remaining samples as a basis set or a dictionary [27, 28]. Subspace clustering assumes that the set of data samples are drawn from a union of multiple subspaces, which can best fit the ambient space [16]. There are numerous real applications satisfying this assumption with varying degrees of exactness [29], e.g
., face recognition, motion segmentation,
etc.By solving an optimization problem with selfrepresentation loss and regularizations, subspace clustering [30, 31] calculates a similarity matrix where each entry indicates the relevance between two samples. Different regularizing schemes with various norms of the similarity matrix, e.g., [9], [32], elastic net [33] or nuclear norm [34], can explore different intrinsic properties of the neighborhood space. There are mainly three types of the regularization terms, including sparseoriented, denselyconnected and mixed norms.
Algorithms based on sparsetype norms [10, 35], e.g., and norms, eliminate most of the nonzero values in the similarity matrix to ensure that there are no connections between samples from different clusters. Elhamifar and Vidal [9] propose the sparse representation based on norm optimization. The obtained similarity matrix recovers a sparse subspace representation but may not satisfy the graph connectivity if the dimension of the subspace is greater than three [17]. In addition, the based subspace clustering methods aim to compute a sparse and subspacepreserving representation for each data sample. Yang et al. [10] present a sparse clustering method with a regularizer based on the norm by using the proximal gradient descent method. Numerous alternative methods have been proposed for minimization while avoiding nonconvex problems, e.g., orthogonal matching pursuit [36] and nearest subspace neighbor [37]. The scalable sparse subspace clustering by orthogonal matching pursuit (SSCOMP) method [27] compares elements in each column of the dot product matrix to determine which positions of the similarity matrix should be nonzero. However, this general pairwise relationship does not reflect the sample correlation well, especially for data pairs in the intersection of two subspaces [38].
In contrast, dense connection based methods, such as smooth representation [32] with norm and low rank representation with nuclear norm based methods [39, 40], propose to preserve many nonzero values in the similarity matrix to ensure the connectivity among intracluster samples [41, 42, 43]. For these densely connected frameworks [44, 45], the similarity matrix is interpreted as a projected representation of raw samples. Each column of the matrix is considered as the selfrepresentation of a sample, and should be dense for mapping invariance (also termed as the grouping effect [46, 32]). Lowrank clustering methods [47, 48] solve a nuclear norm based optimization problem with the aim of generating a block diagonal solution with dense connections. However, the nuclear norm does not enforce subset selection well when noise exists, and the selfrepresentation is too dense to be an efficient feature.
Neither a sparse nor dense similarity matrix reveals a comprehensive correlation structure among samples due to their conflicted nature [49, 50, 51]. Consequently, to achieve tradeoff between sparsity and the grouping effect, numerous mixed norms, e.g., trace Lasso [52] and elastic net [33], have been integrated into the optimization function. Nevertheless, the structure of the data correlations depends on the data matrix, and the mixed norm is not effective for structure selection. Therefore, this method does not perform consistently well on different applications.
Recently, many frameworks that incorporate various constraints into the optimization function have been proposed to detect different intrinsic properties of the subspace [34, 53, 54]. For instance, to handle sequential samples, Guo et al. [55] explore the neighboring relationship by incorporating a new penalty, i.e., a lower triangular matrix with on the diagonal and on the second diagonal, to force consecutive columns in the similarity matrix to be closer. In this paper, based on the intrinsic neighboring relevance and geometrical structures depicted in the similarity matrix, we calculate triplet relationships to form a hypercorrelation constraint of the clustering system. We validate the robustness of the proposed triplet relationship on top of different similarity matrices with various intrinsic properties.
IiB Estimating the Number of Clusters
Most of the real applications in computer vision require estimating the number of clusters, according to the latent distribution of data samples [9]. To solve this problem, three main techniques exists: singularbased Laplacian matrix decomposition, densitybased greedy assignment and hypergraph based segmentation.
Singularbased Laplacian matrix decomposition is common in subspace clustering [56] and spectral clustering [57] due to the availability of the similarity matrix. Liu et al. [58]
propose a heuristic estimator inspired by the blockdiagonal structure of the similarity matrix
[42]. Specifically, they estimate the number of clusters by counting the small singular values of a normalized Laplacian matrix which should be smaller than a given cutoff threshold. These singular based methods
[59, 9] are dependent on a large gap between singular values, which is limited to applications in which the subspaces are sparsely distributed in the ambient space. Meanwhile, the matrix decomposition process is timeconsuming when extended to large scale problems. Recently, Li et al. propose SCAMS [60, 29] which estimates the number of clusters by minimizing the rank of a binary relationship matrix encoding the pairwise relevance among all data samples. Simultaneously, they incorporate a penalty term on the clustering cost by minimizing the Frobenius inner product of the similarity matrix and binary relationship matrix.Density based methods [61] greedily discover both the optimal number of clusters and the assignments of data to the clusters according to the local and global densities which are calculated by the pairwise distances in ambient space. Rodriguez et al. [19] automatically cluster samples based on the assumption that each cluster center is characterized by a higher density in the weight space than all its neighbors, while different centers should be far apart enough to avoid redundancy. Specifically, for each sample, its Euclideanbased local density and the distance to any points with higher densities are iteratively calculated and updated. In each iteration, the algorithm finds a tradeoff between the density of cluster centers and the intercluster distance to update the assignments. Wang et al. [12]
employ the Bayesian nonparametric method based on a Dirichlet process, and propose DPspace, which exploits a tradeoff between data fitness and model complexity. DPspace is more tolerate to noisy and outlier values than the alternative algebraic and geometric solutions. Recently, correlation clustering (CC)
[62] first constructs an undirected graph with positive and negative edge weights, followed by minimizing the sum of cut weights during the segmenting process. Sequentially, the clustering assignments can be optimized with a greedy scheme. Nevertheless, most of these density based algorithms are limited to pairwise correlation when evaluating the similarity of data samples, which is not robust for densely distributed subspaces.IiC Hypergraph Clustering
To tackle the limitations of the pairwise relation based methods, the hypergraph relation [63, 64, 65] is proposed and the related literature follows two different directions. Some transform the hypercorrelation into a simpler pairwise graph [21, 66], followed by a standard graph clustering method, e.g., normalized cut [11], to calculate the assignments. Besides, other methods [39, 13]
explore a generalized way of extending the pairwise graph to the hypergraph or hyperdimensional tensor analysis. For instance, Li
et al. propose a tensor affinity variant of SCAMS, i.e., SCAMSTA [13], which exploits the higher order mathematical structures by providing multiple groups of nodes in the binary matrix derived from an outer product operation on multiple indicator vectors. However, estimating the number of clusters from the rank of the affinity matrix only works well for the ideal case, and can hardly be extended to complex applications since the noise can have a significant impact on the rank of affinity matrix.
In this paper, we estimate the number of clusters by initializing the cluster centers with maximum intercluster dissimilarities and also maximum local densities. We calculate the initialization according to the local correlations reflected by the proposed triplet relationships, where each of them depicts a hyperdimensional similarity among three samples and easilyevaluated relevances to other triplets. Both theoretical analysis on triplets as well as the experimental results demonstrate the effectiveness of the proposed method.
Iii Methodology
Iiia Preliminary
The main notations in the manuscript and the corresponding descriptions are shown in Table I. Given a set of data samples lying in subspaces where denotes the dimensionality of each sample, spectral based subspace clustering usually takes a twostep approach to calculate the clustering assignment. First, it learns a selfrepresentation for each sample to disentangle the subspace structure. The algorithm then employs spectral clustering [11] on the learned similarity graph derived from for final assignments. Note that in practice, both the number of subspaces and their dimensions are always unknown [4, 12]. Hence, the goals of subspace clustering include finding the appropriate and assigning data points into clusters [16, 13].
In this paper, inspired by the blockdiagonal structure of the similarity matrix [67], we propose to simultaneously estimate the number of clusters and assign samples into each cluster in a greedy manner. We design a novel metasample that we call a triplet relationship, followed by optimizing both a model selection reward and a fusion reward for clustering.
IiiB Learning the SelfRepresentation
To explore the neighboring relationship in the ambient space , typical subspace clustering methods first optimize a linear representation of each data sample using the remaining dataset as a dictionary. Specifically, spectral based subspace clustering calculates a similarity matrix by solving a selfrepresentation optimization problem as follows:
(1) 
where denotes the reconstruction loss, is the tradeoff parameter and denotes the regularization term where different ’s lead to various norms [4, 10], e.g., [9, 16], [58, 42], [46], or many kinds of mixed norms like trace Lasso [52] or elastic net [33].
The in (1) can be explained as a new representation of , where each sample is mapped to . Furthermore, is a pairwise distance matrix where each entry reflects the similarity between two samples and . Nevertheless, the pairwise distance reflects poor discriminative capacity on partitioning samples near the intersection of two subspaces. To handle this problem, in this paper, we explore a higherdimensional similarity called Triplet relationship, which is based on a greedy combination of pairwise distances reflected by .
IiiC Discovering Triplet Relationships
Notation  Description 
data sample, a set of samples  
samples in without  
a subspace in the ambient space  
triplet, a set of triplets  
dimensionality of subspace, dimensionality of sample  
number of triplets, number of samples  
real number of clusters, initialized number of cluster centers, estimated number of clusters  
similarity matrix derived from subspace representation method  
binary similarity matrix preserving the top values in each row of and modifying them to  
th entry of the similarity matrix  
th column of  
set of nearest neighbors of  
cluster center which contains part of samples in a subspace  
set of triplets which are already/not assigned into clusters in the th iteration  
set of samples which are already/not assigned into clusters in the th iteration, preserving the frequency  
model selection reward of  
fusion reward that being fused into  
connection score of toward  
local density of against  
one of the result groups, set of the result groups  
NC  deviation rate between the estimated and 
error rate of the triplets 
Given the similarity matrix where each entry reflects the pairwise relationship between and , we propose to find the neighboring structure in a greedy way. For each data sample , subspace clustering algorithms calculate a projected adjacent space based on the selfexpressive property, i.e., each data sample can be reconstructed by a linear combination of other points in the dataset [16, 68]. Therefore, is represented as
(2) 
where includes samples except for , which is considered as a selfexpression dictionary for representation. In addition, records the coefficients of such combination system. With the regularization from various welldesigned norms on , the optimized result of (2) is capable of preserving only linear combinations of samples in while eliminating others. Inspired by [3, 37], for each sample , we first collect its nearest neighbors, i.e. those with the top coefficients in . The nearest neighbors are defined as follows.
Definition 1.
( Nearest Neighbors) Let denote the nearest neighbors for data point . Let:
(3) 
where denotes the set of indices for the nearest neighbors, and denotes the coefficient between and .
According to Definition 1, we obtain for which contains the samples with the largest coefficients in . The number of preserved neighbors, i.e., the parameter , reflects the intrinsic dimension of the lowdimensional subspaces [69, 70], which we empirically evaluated in the experiment section. Based on the nearest neighbors, we define the triplet relationship to explore the local hypercorrelation among samples.
Definition 2.
(Triplet Relationship) A triplet includes three samples, i.e., , and their relationships, if and only if and satisfy:
(4) 
where denotes the indicator function which equals if and otherwise.
Based on Definition 2, we obtain triplets where we always have , i.e., each sample is included in multiple triplet relationships. For clarity of presentation, we define a triplet matrix for data samples , where each row of records the indices of a samples in a triplet .
Compared against the traditional pairwise relationship evoked from , the triplet incorporates complementary using the constraint in (4), which shows more robust capacity in partitioning samples near the intersection of two subspaces. Each triplet depicts a local geometrical structure which enables a better performance to estimate the density of each sample. Furthermore, the overlapped samples in multiple triplets reflect a global hypersimilarity among each other, which can be measured efficiently. Therefore, based on the triplet relationship, we can jointly estimate the number of subspaces and calculate the clustering assignment in a greedy manner.
IiiD Modeling Clustering Rewards
Given , we iteratively group data samples into clusters, i.e., , where denotes the estimated number of subspaces. According to the greedy strategy, in the th iteration, the triplet is divided into two subsets, i.e., “incluster” triplets which are already assigned into clusters, and “outofcluster” triplets which are still to be assigned in the subsequent iterations. For clearer presentation, we reshape both matrices and to vectors and . In each iteration, we propose to optimize two new rewards, i.e., the model selection and the fusion reward, to simultaneously estimate the number of clusters and merge samples into respective cluster.
Definition 3.
(Model Selection Reward) Given and in the th iteration, the model selection reward for each initialized cluster in is defined as:
(5) 
where is a counting function on the frequency that for all , denotes the tradeoff parameter.
By maximizing the model selection reward , we generate the initialized cluster which has the following two advantages, where is the estimated number of clusters (see Fig. 1 for visualization). Firstly, the local density of sample is high, i.e., has a large amount of correlated samples in , which enables many to be merged in the next iteration. Secondly, each has little correlation with samples in , which eliminates the overlap of any interclusters. Consequently, we can simultaneously estimate and initialize the clusters by optimizing the model selection reward .
Definition 4.
(Fusion Reward) Given the initialized clusters
, the fusion reward is defined as the probability that
is assigned into :(6) 
where denotes the nearest neighbors of and denotes the set of nearest neighbors of samples in , denotes the tradeoff parameters.
In the optimization procedure, we calculate fusion rewards for each , which represent the probabilities that is assigned into clusters , respectively. We then merge into the cluster with the largest fusion reward, and move from to .
IiiE Automatic Subspace Clustering Algorithm
The first triplet for initializing a new cluster is chosen to have maximal local density. The local density is defined as follows.
Definition 5.
(Local Density) The local density of the triplet regarding to the is defined as follows:
(7) 
where denotes the sample in the current triplet and is the set of their indexes, denotes the scale of .
Also, to measure the hypersimilarity between samples and determine the optimal triplet to merge into the initialized clusters, we define the connection score as follows.
Definition 6.
(Connection Score) The connection score between samples and is defined as:
(8) 
where is equal to when and otherwise, is the number of all triplets in .
We greedily optimize the proposed model selection reward and fusion reward in autoSC to simultaneously estimate the number of clusters and generate the segmentation among samples:
(9) 
where denotes the set of the result groups, is the estimated number of clusters and denotes the universal ordinal set of samples.
We present the proposed autoSC in Fig. 1 and Algorithm 1. Specifically, the optimization includes three steps: 1) generating the triplet relationships from the similarity matrix ; 2) estimating the number of clusters and initializing the clusters ; 3) assigning the samples into proper cluster.
Calculating Triplets: The similarity matrix reflects the correlations among samples [9], where larger values demonstrate stronger belief for the correlation between samples. For instance, indicates a larger probability for and being in the same cluster over and . Accordingly, we explore the intrinsic local correlations among samples by the proposed triplets derived from .
Many subspace representations guarantee the mapping invariance via a dense similarity matrix . However, the generation of triplets relies only on the strongest connections to avoid the wrong assignment. Therefore, for each column of , i.e., , we preserve only the top values which are then modified to for a new binary similarity matrix .
Then, we extract each triplet from by the following function:
(10) 
where denotes the th value of . Note each sample can appear in many triplets. Therefore, we consider each as a metaelement in the clustering, which improves the robustness due to the complementarity constraints.
Initializing Clusters: In the th iteration, we first determine an initial triplet (termed as ) from to initialize the cluster , followed by merging the most correlated samples of into each .
Following [19], we initialize a new cluster using with highest local density:
(11) 
where calculates the local density defined in Definition 5. The high local density of the triplet reflects the most connections between and other triplets, which produces the most connections between and other samples in .
Once the initialized triplet is determined, we iteratively extend the initialized cluster by fusing the most confident triplets. For each triplet in , we calculate the sum of the connection score regarding the samples in to greedily determine whether the samples in should be assigned into or not:
(12) 
where denote the set of indexes for the samples in and , respectively. We iteratively update the auxiliary sets , , and in the iterations.
Terminating: We terminate the process of estimating the number of clusters and get clusters if and only if satisfies:
(13) 
Specifically, if the samples in are of high frequency in , i.e., the triplet with the highest local density in is already contained in , we consider that the clusters are sufficient for modeling the intrinsic subspaces.
Avoiding OverSegmentation: We also introduce an alternative step to check the redundancy among initialized clusters to avoid oversegmentation. We calculate the connection scores for smallscale clusters against others, and merge the highly correlated clusters and if we have
(14) 
where denotes the number of samples in . We then get the initialized clusters , where is the estimated number of clusters and .
Assigning Rest Samples: Given , we assign each of the remaining samples into which evokes an optimal fusion reward. For , we find its optimal cluster by the following equation:
(15) 
where is the fusion reward defined by (6).
IiiF An Extension: Neighboring based AutoSC Algorithm
In Definition 1, we collect nearest neighbors according to the magnitude of similarities between sample and all other samples in . These similarities are depicted in by optimizing (1) which is composed of a reconstruction loss term and a regularization term. In this subsection, we extend the autoSC with an alternative technique to find for each based on greedy search.
For each data sample , we let be the subspace spanned by and its neighbors in the th iteration, where the neighbor set is initialized as , and . In each iteration, we measure the projected similarity between and other nonneighbor samples by calculating the orthonormal ordinates in the spanned subspace. For example, to calculate the similarity between and in the th iteration, we have
(16) 
where denotes the Frobenius norm and . Consequently, for in the th iteration, we find the closest neighbor and update as follows:
(17) 
Here, we find one neighbor in each iteration and update the spanned subspace accordingly. The newly spanned subspace reflects more local structure of the ambient space which is assumed to cover the current sample. The neighbor set is also updated by adding the new neighbor which is found in the th iteration. Finally, with iterations for each sample, we get an alternative nearest neighbor set .
Given the neighbor matrix , we propose the neighboring based autoSC algorithm (autoSCN) to directly discover the triplet relationship among data samples, followed by optimizing both model selection and fusion rewards for clustering. The main steps of autoSCN are summarized in Algorithm 2.
IiiG Computational Complexity Analysis
In traditional subspace clustering system, the calculation of selfrepresentation requires solving convex optimization problems over constraints [20]
. Spectral clustering is based on an eigendecomposition operation on the Laplacian matrix followed by conducting Kmeans on the eigenvectors, both of which are timeconsuming, involving a complex algebraic decomposition and iterative optimization, respectively
[68, 71]. The overall computational complexity can be more than . For the proposed autoSC, it takes to collect the triplet relationships for samples in the space spanned by the nearest neighbors. Here, since we have , the complexity of collecting the triplet relationships is . The optimization of both model selection and fusion rewards takes where the number of triplets has the same order of magnitude as . Specifically, we have , and thus the complexity of clustering is .For the extension, i.e., autoSCN, collecting the neighbor matrix takes where the basic operation is a dot product of the dimensional tensors. This avoids the calculation of any convex optimization problem.
Clustering  Metrics  extended Yale B  COIL20  
8  15  25  30  38  5  10  15  20  
LSR [46]  SCAMS [13, 29]  NC  5.21  14.00  17.12  21.25  23.00  4.36  9.00  18.32  21.00 
NMI  0.1652  0.0643  0.1544  0.3821  0.4236  0.2435  0.1524  0.1124  0.1728  
SMR [32]  SCAMS [13, 29]  NC  9.26  23.60  41.39  76.22  81.00  8.48  19.72  32.40  37.00 
NMI  0.7183  0.7272  0.6992  0.7266  0.7425  0.5885  0.6527  0.6668  0.6712  
LSR [46]  DP [19]  NC  7.90  98.38  127.92  308.00  341.00  10.90  14.70  301.05  228.00 
NMI  0.7060  0.6067  0.6245  0.6516  0.6611  0.7060  0.4984  0.6516  0.5283  
SMR [32]  DP [19]  NC  3.06  7.84  14.62  24.76  29.00  2.22  5.30  9.72  11.00 
NMI  0.6196  0.5026  0.4391  0.2166  0.2384  0.6864  0.4467  0.3643  0.3547  
LSR [46]  SVD [58]  NC  7.00  9.42  21.04  41.23  44.00  2.76  9.00  12.05  14.00 
NMI  0.2412  0.4304  0.5567  0.6523  0.6726  0.6210  0.1302  0.4092  0.4125  
SMR [32]  SVD [58]  NC  2.40  9.06  11.65  24.00  28.00  0.48  2.58  8.36  12.00 
NMI  0.7078  0.4993  0.3739  0.2808  0.2766  0.7024  0.7127  0.7224  0.7035  
  DPspace [12]  NC  2.08  8.96  15.75  23.92  26.00  0.78  4.78  9.38  14.00 
NMI  0.0343  0.0226  0.0432  0.0406  0.0525  0.0904  0.0829  0.0718  0.0834  
  autoSCN  NC  0.87  3.16  4.32  7.68  9.00  0.75  2.21  2.42  5.00 
NMI  0.8306  0.7328  0.7165  0.6566  0.6871  0.7933  0.6216  0.7895  0.7126  
LSR [46]  autoSC  NC  1.08  3.32  5.79  10.50  12.00  1.50  3.40  2.00  4.00 
NMI  0.8251  0.7375  0.6871  0.5972  0.5833  0.7786  0.5581  0.8670  0.8239  
SMR [32]  autoSC  NC  0.76  2.08  3.15  4.98  4.00  0.38  1.18  0.80  2.00 
NMI  0.9062  0.8589  0.8432  0.8287  0.7943  0.8315  0.7701  0.7266  0.7568  

Iv Experiments
Iva Experimental Setup
In the experiments, we compare the automatic methods on the benchmark datasets, i.e., the extended Yale B [72] and the COIL20 [73] dataset, followed by verifying the robustness of the proposed method to different
derived from various selfrepresentation schemes along with combinations of different methods for estimating the number of clusters and segmenting the samples. We design comprehensive evaluation metrics to validate the clustering performance,
i.e., the error rate of the number of clusters and the triplets. For all experiments on subsets, the reported results are the average of trials. We also conduct experiments on a motion segmentation task using the Hopkins 155 dataset.IvA1 Datasets
The extended Yale B [72] dataset is a widely used face clustering dataset which contains face images with different illumination of subjects, each subject has images.
The COIL20 [73] dataset consists of different real subjects, including cups, bottles and so on. For each subject, there are images with different camera viewpoints.
The Hopkins 155 dataset [74] consists of video sequences. For each video sequence, there are or motions.
IvA2 Comparative Methods
We make comparisons with the following methods: SCAMS [13, 29], a density peak based method (DP) [19]
, a singular value decomposition based method (SVD)
[58] and DPspace [12]. Besides, we utilize the following subspace representation methods to generate different coefficient matrices : LRR [39], CASS [52], LSR [46], SMR [32] and ORGEN [33]. The similarity matrix is then used to calculate the triplet relationships for autoSC.IvA3 Evaluation Metrics
To evaluate the performance of the proposed triplets, we define the error rate as follows:
(18) 
where denotes the number of the triplets and is the counting function on the frequency that for all . Here the output of ranges from to . The dynamic set consists of samples in one subspace according to the ground truth, where contains as many samples in as possible.
We introduce the error rate of the number of clusters (NC) as the primary evaluation metric for the clustering methods which estimate the number of clusters automatically:
(19) 
where is the real number of clusters, is the number of trials and is the estimated number of clusters in the th trial. We also use the standard normalized mutual information (NMI) [75] to measure the similarity between two clustering distributions, i.e., the prediction and the ground truth. With respect to NMI, the entropy illustrates the nondeterminacy of one clustering to the other, and the mutual information quantifies the amount of information that one variable obtains from the other.
IvA4 Parameter
The parameter in Definition 1, i.e., the number of preserved neighbors for each sample, is related to the intrinsic dimension of the subspaces. We empirically evaluate the influence of on both extended Yale B and COIL20 datasets with 15 subjects. Besides, we use subspace representations derived from SMR. The results are shown in Table III. As shown in the table, the proposed method achieves best performance when we have for most cases. Actually, the parameter is robust since the performance is stable when .
Metrics  5  6  7  8  9  10  11  
EYaleB  NC  4.52  3.68  3.13  2.08  2.12  2.29  2.18 
NMI  0.7125  0.7736  0.8047  0.858  0.8551  0.8423  0.8536  
Coil20  NC  1.68  1.29  0.92  0.80  0.88  0.76  0.98 
NMI  0.5647  0.6157  0.6774  0.7266  0.7107  0.7211  0.7120 
extended Yale B  COIL20  
8  15  25  30  38  5  10  15  20  
LRR [39]  0.0155  0.0147  0.0158  0.0176  0.0169  0.0185  0.0252  0.0224  0.0231 
CASS [52]  0.0158  0.0148  0.0140  0.0157  0.0162  0.0195  0.0198  0.0203  0.0193 
LSR [46]  0.0144  0.0148  0.0162  0.0181  0.0172  0.0188  0.0188  0.0212  0.0199 
SMR [32]  0.0135  0.0149  0.0154  0.0181  0.0161  0.0175  0.0182  0.0196  0.0202 
ORGEN [33]  0.0166  0.0145  0.0151  0.0177  0.0169  0.0196  0.0210  0.0215  0.0220 
IvB Comparisons among Automatic Clustering
We conduct experiments on the extended Yale B and COIL20 datasets with different numbers of subjects, and compare four methods with the proposed autoSC and autoSCN on the metrics of NC and NMI. For SCAMS [13, 29], DP [19], SVD [58] and our autoSC, the optimization module in SMR [32] is employed to generate the similarity matrix . The DPspace method simultaneously estimates and finds the subspaces without the requirement of a similarity matrix. All parameters of the contrasted methods are tuned to provide the best performance.
Fig. 2 and Table II report the performance. As shown in Table II, when combining SMR, the averaged NC of autoSC is smaller than other comparative methods on all experimental configurations, indicating that it gives a close estimation on the number of clusters. For example, the estimated on the extended Yale B with subjects has a deviation of less than , and produces a NMI higher than . The autoSCN gets the second best performance on most configurations, which demonstrates the effectiveness of both triplet relationship and reward optimization. In contrast, SVD achieves comparable results on the smallscale configuration of each dataset, but the performance becomes poor when the number of samples increases. It is mainly because the largest gap between the pair of singular values decreases when the number of clusters becomes larger. When combining SMR, SCAMS performs comparably according to NMI on both datasets, however, as is illustrated in Fig. 2 (a) and Table II, it provides a much larger than the ground truth, e.g., when on the extended Yale B dataset. NMI does not strongly penalize oversegmentation, making the metric NC be the primary evaluation of the SCAMS method. The DPspace performs well on NC, but has poor performance on the NMI. This is because most samples are assigned into one cluster, and the other clusters are small. In addition, when combining LSR, as shown in Table II, the performance of all methods decrease, while the proposed autoSC still achieves best performance on most configurations. It demonstrates the generalization ability of our autoSC.
IvC Robustness to SelfRepresentations Schemes
The methods including SCAMS [13, 29], DP [19], SVD [58] and the proposed autoSC require the similarity matrix as input. Also, for DP [19], the distance among samples needs to be calculated. We calculate the distance between samples and by rather than the simple Euclidean distance. To verify the robustness of the proposed autoSC regarding various subspace representations, we calculate the similarity matrix using subspace representation modules, followed by the combinations with the methods which automatically estimate the number of clusters and segment the samples.
Table IV shows the evaluation results of on both datasets with the combinations of subspace representations, while the NC and NMI on the extended Yale B dataset with subjects are reported in Fig. 3. Moreover, we visualize the similarity matrix derived from subspace representation modules in Fig. 4. We can see from Fig. 3 that the SCAMS, DP and SVD methods are sensitive to the choice of the subspace representation module. For example, DP estimates as a relatively close value to the ground truth when combined with SMR (NC), but generates a totally wrong estimation when combined with LRR (NC). Different subspace representation modules generate coefficient matrices with various intrinsic properties [4], thus the parameter for truncation error needs to be tuned carefully.
For the proposed autoSC, it is stable on different combinations considering the metric of NC and , which demonstrates the complementary ability of the proposed method. For all combinations, the error rate of the triplets obtained from (10) is less than , which guarantees the consistency of the proposed autoSC with different kinds of . Furthermore, it shows better performance when combined with CASS, LSR and SMR than other combinations on both metrics in Fig. 3. The reason lies on the guarantee of the mapping invariance which is termed as the grouping effect [52, 46, 32], together with the filtering of weak connections and the selfconstraint among samples within triplets. As shown in Fig. 4 (b), (c), (d), the coefficient matrices are dense while it shows blockdiagonal structure in Fig. 4 (d) and each block corresponds to one cluster. Therefore, the nearest neighbors which are used to generate the triplets can be chosen precisely. The performance decreases when combined with ORGEN since the similarity matrix derived from ORGEN is sparse with less locations for constructing effective triplets.
Subjects  SCAMS  DP  SVD  DPspace  autoSCN  autoSC 
8  12.45  6.92  17.32  9.52  1.69  2.02 
15  30.12  13.04  44.13  18.05  4.72  5.79 
25  125.66  33.76  146.78  36.94  10.28  19.81 
30  175.80  59.02  225.93  67.89  16.33  36.06 
38  267.07  97.69  314.28  104.50  29.45  62.35 
IvD Time Efficiency
Table V shows the runtime of comparative methods using subsets from the extended Yale B dataset. The experiments are conducted on a machine with a GHz CPU and GB RAM. AutoSCN requires the least runtime compared to all comparative methods due to the following two reasons. First, autoSCN explores the neighborhood relationship in the raw data space rather than solving a convex optimization problem. Second, it employs a greedy optimization scheme to estimate the number of clusters and calculate the clustering assignment rather than a complex optimizing method such as computing the singular value decomposition. Note the proposed autoSC method achieves the second best result among comparative methods.
Metrics  SCAMS  DP  SVD  DPspace  autoSCN  autoSC 
3.67  2.12  1.29  2.97  0.52  0.18  
NMI  0.7892  0.8233  0.8670  0.7921  0.9155  0.9871 
Time  2.68  1.22  2.45  1.56  0.26  0.32 
IvE Real Application: Motion segmentation
Motion segmentation refers to the task of segmenting multiple video sequence. The candidate video is composed of multiple foreground objects, which are rapidly moving and required to be clustered into spatiotemporal regions corresponding to specific motions. Following the traditional scheme [16], we consider the Hopkins 155 dataset [74] and solve the motion segmentation problem by first extracting a set of feature points for each frame followed by clustering them based on the motions. Table VI reports the comparison against four automatic clustering methods. For SCAMS [13, 29], DP [19] and SVD [58], the SMR [32] is firstly conducted to calculate the similarity matrix. As shown in the table, the proposed autoSC achieves best performance on both metrics, indicating that the autoSC is effective at both estimating the number of motions (about error rate) and segmenting the feature points (obtains NMI of more than ). In addition, it shows favorable efficiency on the motion segmentation task. The autoSCN is the most efficient method ( per sequence) with second best performance on NC and NMI. The SVD method obtains the best result among other comparative methods, but it consumes much more time (about more than per sequence) due to the singular value decomposition process.
V Conclusion
In this paper, we propose a joint model to estimate the number of clusters and segment the samples in a data set. Based on the selfrepresentation of dataset, we first design a hypercorrelation oriented metaelement termed as the triplet relationship, which indicates a compact local structure among three samples. The triplet is more robust than pairwise relationships when partitioning samples near the intersection of two subspaces due to the complementarity of mutual restrictions. Accordingly, we propose the autoSC method to optimize two reward functions simultaneously, of which the model selection reward constrains the number of clusters and the fusion reward facilitates the clustering assignment of the samples. Both functions are greedily maximized during the clustering process. In addition, we provide an extension of autoSC which automatically calculates the neighboring relationship in the raw data space rather than a similarity space spanned by selfrepresentation. Experimental results on face clustering, synthetic dataset clustering and motion segmentation tasks demonstrate the effectiveness and efficiency of our approaches.
References

[1]
J. Yang, J. Liang, K. Wang, Y.L. Yang, and M.M. Cheng, “Automatic model
selection in subspace clustering via triplet relationships,” in
AAAI Conference on Artificial Intelligence
, 2018.  [2] X. Wang, X. Guo, Z. Lei, C. Zhang, and S. Z. Li, “Exclusivityconsistency regularized multiview subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 923–931.
 [3] X. Peng, Z. Yu, Z. Yi, and H. Tang, “Constructing the L2graph for robust subspace learning and subspace clustering,” IEEE Transactions on Cybernetics, vol. 47, no. 4, pp. 1053–1066, 2017.
 [4] R. Vidal, “Subspace clustering,” IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52–68, 2011.

[5]
H. Jia and Y.M. Cheung, “Subspace clustering of categorical and numerical
data with an unknown number of clusters,”
IEEE Transactions on Neural Networks and Learning Systems
, vol. 29, no. 8, pp. 3308–3325, 2018. 
[6]
X. Xu, Z. Huang, D. Graves, and W. Pedrycz, “A clusteringbased graph laplacian framework for value function approximation in reinforcement learning,”
IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2613–2625, 2014. 
[7]
P. Zhu, W. Zhu, Q. Hu, C. Zhang, and W. Zuo, “Subspace clustering guided unsupervised feature selection,”
Pattern Recognition, vol. 66, pp. 364–374, 2017.  [8] X. Cao, C. Zhang, C. Zhou, H. Fu, and H. Foroosh, “Constrained multiview video face clustering,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4381–4393, 2015.
 [9] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 2790–2797.
 [10] Y. Yang, J. Feng, N. Jojic, J. Yang, and T. S. Huang, “sparse subspace clustering,” in European Conference on Computer Vision, 2016, pp. 731–747.
 [11] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.

[12]
Y. Wang and J. Zhu, “DPspace: Bayesian nonparametric subspace clustering with smallvariance asymptotics,” in
International Conference on Machine Learning
, 2015, pp. 862–870.  [13] Z. Li, S. Yang, L. F. Cheong, and K. C. Toh, “Simultaneous clustering and model selection for tensor affinities,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5347–5355.
 [14] S. Javed, A. Mahmood, T. Bouwmans, and S. K. Jung, “Backgroundforeground modeling based on spatiotemporal sparse subspace clustering,” IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5840–5854, 2017.
 [15] D. Kumar, J. C. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie, and T. C. Havens, “A hybrid approach to clustering in big data,” IEEE Transactions on Cybernetics, vol. 46, no. 10, pp. 2372–2385, 2016.
 [16] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2765–2781, 2013.
 [17] B. Nasihatkon and R. Hartley, “Graph connectivity in sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 2137–2144.
 [18] K. Zhan, C. Zhang, J. Guan, and J. Wang, “Graph learning for multiview clustering,” IEEE transactions on cybernetics, no. 99, pp. 1–9, 2017.
 [19] A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014.
 [20] X. Peng, L. Zhang, and Z. Yi, “Scalable sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 430–437.
 [21] S. Gao, I. W. Tsang, and L. T. Chia, “Laplacian sparse coding, hypergraph laplacian sparse coding, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 92–104, 2013.
 [22] S. Kim, D. Y. Chang, S. Nowozin, and P. Kohli, “Image segmentation usinghigherorder correlation clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 9, pp. 1761–1774, 2014.
 [23] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in International Conference on Computer Vision, 2015, pp. 815–823.
 [24] H. Wang, T. Li, T. Li, and Y. Yang, “Constraint neighborhood projections for semisupervised clustering,” IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 636–643, 2014.
 [25] C.G. Li, C. You, and R. Vidal, “Structured sparse subspace clustering: A joint affinity learning and subspace clustering framework,” IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2988–3001, 2017.
 [26] C. Zhang, H. Fu, Q. Hu, P. Zhu, and X. Cao, “Flexible multiview dimensionality coreduction,” IEEE Transactions on Image Processing, vol. 26, no. 2, pp. 648–659, 2017.
 [27] C. You, D. Robinson, and R. Vidal, “Scalable sparse subspace clustering by orthogonal matching pursuit,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3918–3927.
 [28] Y. Cheng, Y. Wang, M. Sznaier, and O. Camps, “Subspace clustering with priors via sparse quadratically constrained quadratic programming,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5204–5212.
 [29] Z. Li, L.F. Cheong, S. Yang, and K.C. Toh, “Simultaneous clustering and model selection: Algorithm, theory and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 8, pp. 1964–1978, 2018.
 [30] F. Wu, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Ordered subspace clustering with blockdiagonal priors,” IEEE Transactions on Cybernetics, vol. 46, no. 12, pp. 3209–3219, 2016.
 [31] C. G. Li and R. Vidal, “Structured sparse subspace clustering: A unified optimization framework,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 277–286.
 [32] H. Hu, Z. Lin, J. Feng, and J. Zhou, “Smooth representation clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3834–3841.
 [33] C. You, C. G. Li, D. P. Robinson, and R. Vidal, “Oracle based active set algorithm for scalable elastic net subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3928–3937.
 [34] X. Fang, Y. Xu, X. Li, Z. Lai, and W. K. Wong, “Robust semisupervised subspace clustering via nonnegative lowrank representation,” IEEE Transactions on Cybernetics, vol. 46, no. 8, pp. 1828–1838, 2016.
 [35] M. Rahmani and G. Atia, “Innovation pursuit: A new approach to the subspace clustering problem,” in International Conference on Machine Learning, 2017, pp. 2874–2882.
 [36] E.L. Dyer, A.C. Sankaranarayanan, and R.G. Baraniuk, “Greedy feature selection for subspace clustering,” Journal of Machine Learning Research, vol. 14, no. 1, pp. 2487–2517, 2013.
 [37] D. Park, C. Caramanis, and S. Sanghavi, “Greedy subspace clustering,” in Advances in Neural Information Processing Systems, 2014, pp. 2753–2761.
 [38] P. Purkait, T.J. Chin, A. Sadri, and D. Suter, “Clustering with hypergraphs: the case for large hyperedges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1697–1711, 2017.
 [39] H. Liu, L. Latecki, and S. Yan, “Robust clustering as ensembles of affinity relations,” in Advances in Neural Information Processing Systems, 2010, pp. 1414–1422.
 [40] C.G. Li and R. Vidal, “A structured sparse plus structured lowrank framework for subspace clustering and completion,” IEEE Transactions on Signal Processing, vol. 64, no. 24, pp. 6557–6570, 2016.
 [41] Y. Guo, J. Gao, and F. Li, “Spatial subspace clustering for drill hole spectral data,” Journal of Applied Remote Sensing, vol. 8, no. 1, p. 083644, 2014.
 [42] J. Feng, Z. Lin, H. Xu, and S. Yan, “Robust subspace segmentation with blockdiagonal prior,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3818–3825.
 [43] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Product Grassmann manifold representation and its LRR models,” in AAAI Conference on Artificial Intelligence, 2016, pp. 2122–2129.
 [44] B. Liu, X.T. Yuan, Y. Yu, Q. Liu, and D.N. Metaxas, “Decentralized robust subspace clustering,” in AAAI Conference on Artificial Intelligence, 2016, pp. 3539–3545.
 [45] S. Xiao, W. Li, D. Xu, and D. Tao, “FaLRR: A fast low rank representation solver,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4612–4620.
 [46] C. Y. Lu, H. Min, Z. Q. Zhao, L. Zhu, D. S. Huang, and S. Yan, “Robust and efficient subspace segmentation via least squares regression,” in European Conference on Computer Vision, 2012, pp. 347–360.
 [47] G. Liu, H. Xu, J. Tang, Q. Liu, and S. Yan, “A deterministic analysis for LRR,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 417–430, 2016.
 [48] C. Xu, Z. Lin, and H. Zha, “A unified convex surrogate for the Schattenp norm.” in AAAI Conference on Artificial Intelligence, 2017, pp. 926–932.
 [49] Y.X. Wang, H. Xu, and C. Leng, “Provable subspace clustering: When LRR meets SSC,” in Advances in Neural Information Processing Systems, 2013, pp. 64–72.
 [50] H. Lai, Y. Pan, C. Lu, Y. Tang, and S. Yan, “Efficient kSupport matrix pursuit,” in European Conference on Computer Vision, 2014, pp. 617–631.
 [51] E. Kim, M. Lee, and S. Oh, “Robust ElasticNet subspace representation,” IEEE Transactions on Image Processing, vol. 25, no. 9, pp. 4245–4259, 2016.
 [52] C. Lu, J. Feng, Z. Lin, and S. Yan, “Correlation adaptive subspace segmentation by Trace Lasso,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1345–1352.
 [53] J. Xu, K. Xu, K. Chen, and J. Ruan, “Reweighted sparse subspace clustering,” Computer Vision and Image Understanding, vol. 138, pp. 25–37, 2015.
 [54] A. A. Abin, “Querying beneficial constraints before clustering using facility location analysis,” IEEE Transactions on Cybernetics, vol. 48, no. 1, pp. 312–323, 2018.
 [55] Y. Guo, J. Gao, and F. Li, “Spatial subspace clustering for hyperspectral data segmentation,” in International Conference on Digital Information Processing and Communications, 2013, pp. 180–190.
 [56] J. Wang, X. Wang, F. Tian, C. H. Liu, and H. Yu, “Constrained lowrank representation for robust subspace clustering,” IEEE Transactions on Cybernetics, vol. 47, no. 12, pp. 4534–4546, 2017.
 [57] Y. Yang, Z. Ma, Y. Yang, F. Nie, and H. T. Shen, “Multitask spectral clustering by exploring intertask correlation,” IEEE Transactions on Cybernetics, vol. 45, no. 5, pp. 1083–1094, 2015.
 [58] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by lowrank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
 [59] P. Favaro, R. Vidal, and A. Ravichandran, “A closed form solution to robust subspace estimation and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1801–1807.
 [60] Z. Li, L. F. Cheong, and S. Z. Zhou, “SCAMS: Simultaneous clustering and model selection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 264–271.
 [61] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A densitybased algorithm for discovering clusters in large spatial databases with noise,” in International Conference on Knowledge Discovery and Data Mining, 1996, p. 226–231.
 [62] T. Beier, F. A. Hamprecht, and J. H. Kappes, “Fusion moves for correlation clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3507–3516.

[63]
C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan, “Tensor robust principal component analysis: Exact recovery of corrupted lowrank tensors via convex optimization,” in
IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2080–2088.  [64] P. Purkait, T. J. Chin, H. Ackermann, and D. Suter, “Clustering with hypergraphs: The case for large hyperedges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1697–1711, 2017.
 [65] X. Li, G. Cui, and Y. Dong, “Graph regularized nonnegative lowrank matrix factorization for image clustering,” IEEE Transactions on Cybernetics, vol. 47, no. 11, pp. 3840–3853, 2017.
 [66] B. Schölkopf, J. Platt, and T. Hofmann, “Learning with hypergraphs: Clustering, classification, and embedding,” in Advances in Neural Information Processing Systems, 2006, pp. 1601–1608.
 [67] M. Lee, J. Lee, H. Lee, and N. Kwak, “Membership representation for detecting blockdiagonal structure in lowrank or sparse subspace clustering.” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1648–1656.
 [68] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Advances in Neural Information Processing Systems, 2002, pp. 585–591.
 [69] E. Elhamifar and R. Vidal, “Sparse manifold clustering and embedding,” in Advances in Neural Information Processing Systems, 2011, pp. 55–63.
 [70] C. Li, J. Guo, and H. Zhang, “Learning bundle manifold by double neighborhood graphs,” in Asian Conference on Computer Vision, 2009, pp. 321–330.
 [71] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
 [72] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643–660, 2001.
 [73] S. A. Nene, S. K. Nayar, and H. Murase, “Columbia object image library (COIL20),” Columbia Universty, Tech. Rep. CUCS00596, 1996.
 [74] R. Tron and R. Vidal, “A benchmark for the comparison of 3D motion segmentation algorithms,” in IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.
 [75] M. Li, X. Chen, X. Li, and B. Ma, “Clustering by compression,” in IEEE International Symposium on Information Theory, vol. 51, no. 4, 2003, pp. 1523–1545.
Comments
There are no comments yet.