1 Introduction
Subspace Clustering (SC) is the de facto method in various clustering tasks such as motion segmentation [17, 7, 10, 11], face clustering [9, 8] and image segmentation [36, 24]. As the name implies, the underlying assumption in SC is that samples forming a cluster can be adequately described by a subspace. Such data modeling is natural in many applications. One prime example is face clustering in which it has been shown that the face images of one subject obtained with a fixed pose and varying illumination lie in a lowdimensional subspace [20].
Most recent subspace clustering methods [8, 21]
assume that data points lie on a union of linear subspaces and construct an affinity matrix for spectral clustering. Although promising results on certain datasets are obtained, the performances degrade significantly when nonlinearity arises in the data. Moreover, constructing the affinity matrix and performing clustering demand hefty memory footprints and processing power. To benefit from the concept of SC and its unique features, two issues should be addressed;
NonLinearity: Majority of the SC algorithms target clustering with linear subspaces. This is a very bold assumption and can hardly be met in practice. Some studies [26, 38, 34, 12]employ kernel methods to alleviate this limitation. Nevertheless, kernel methods still suffer from the scalability issues [40]. To make things more complicated, there is no guideline as how to choose the kernel function and its parameters which truly wellsuited to subspace clustering.
Scalability: With the current trend in analyzing big data, SC algorithms should be able to deal with large volume of data. However, most of the stateoftheart methods for SC make use of an affinity matrix along norm regularization (e.g., [7, 8], [13] or nuclear [21, 32]). Not only building an affinity matrix demands for solving large scale optimization problems, but also performing spectral clustering on an affinity matrix, whose size is dictated by the number of samples, is overwhelming.
In this paper, instead of constructing the affinity matrix for spectral clustering, we revisit the subspace clustering (SC) method [5, 30, 2] to design a novel and scalable method. In order to handle nonlinear subspaces, we propose to utilize deep neural networks to project data to a latent space where SC can be easily applied. Our contributions in this paper are threefolds:

We bypass the steps of constructing an affinity matrix and performing spectral clustering, which have been used in mainstream subspace clustering algorithms, and accelerate the computation by using a variant of subspace clustering. As a result, our method can handle datasets that are orders of magnitudes larger than those considered in traditional methods.

In order to address nonlinearity, we equip deep neural networks with subspace priors. This in return enables us to learn an explicit nonlinear mapping of the data that is wellsuited for subspace clustering.

We propose novel strategies to update subspace bases. When the size of the dataset at hand is manageable, we update subspaces in closedform using Singular Value Decomposition (SVD) with a simple mechanism to rule out outliers. For large datasets, we update subspaces by making use of the stochastic optimization methods on the Grassmann manifolds.
Empirically, evaluations on relatively large datasets such as MNIST and FashionMNIST dataset [33] show that our proposed method achieves the stateoftheart results in terms of clustering accuracies and speed.
2 Related Work
Linear subspace clustering methods can be classified as algebraic algorithms, iterative methods, statistical methods and spectral clusteringbased methods
[31]. Among them, spectral clusteringbased methods [7, 21, 13, 16, 14, 39] have become dominant in the literature. In general, spectral clusteringbased methods solve the problem in two steps: encode a notion of similarity between pairs of data points into an affinity matrix; then, apply normalized cuts [29] or spectral clustering [25] on this affinity matrix. To construct the affinity matrix, recent methods tend to rely on the concept of selfexpressiveness, which seeks to express each point in a cluster as a linear combination of other points sharing some common notions (e.g., coming from the same subspace).The literature on true endtoend learning of subspace clustering is surprisingly limited. Furthermore and to the best of our knowledge, none of the deep algorithms can handle medium size datasets, let aside the large ones^{1}^{1}1Among all the datasets that have been tested, COIL100 with 7,200 images seems to be the largest one.. In hybrid methods such as [28], handcrafted features (e.g., SIFT [23] or HOG [6]
) are fed into a deep autoencoder with a sparse subspace clustering (SSC) prior. The final clustering is then obtained by applying kmeans or SSC on the learned autoencoder features. Instead of using handcrafted features, Deep subspace clustering Networks (DSCNET)
[15] employ the deep convolutional Autoencoder to nonlinearly map the images to a latent space, and make use of a selfexpressive layer between the encoder and the decoder to learn the affinities between all the data points. Through learning affinity matrix within the neural network, stateoftheart results on several traditional small datasets are reported in [15]. Nevertheless, relying on the whole dataset to create the affinity matrix, DSCNET cannot scale for large dataset.The SSC by Orthogonal Matching Pursuit (SSCOMP) [40]
is probably the only subspace clustering which could be considered as “scalable”. The main idea is to replace the large scale convex optimization procedure with the OMP algorithm in constructing the affinity matrix. Having said this, SSCOMP makes use of spectral clustering and hence still fails to really push subspace clustering for large scale datasets.
Subspace Clustering [30, 2], an iterative methods, can be considered as a generalization of means algorithm. SC shows fast convergence behavior and can handle both linear and affine subspaces explicitly. However, SC methods are sensitive to outliers and initialization. Attempts to make SC methods more robust include the work of Zhang et al. [41] and Balzano et al. [3]. In the former, best subspaces from a large number of candidate subspaces are selected using a greedy combinatorial algorithm [41] to make the algorithm robust to data corruptions. Balzano et al. propose a variant of subspaces method named GROUSE which can handle the missing data in subspace clustering. However, the resulting methods seem not to producing competitive results compared to methods relying on affinity matrices.
In this paper, we propose Subspace Clustering(SC) networks which incorporate SC into a deep neural network embedding. This lets us not only bypass the affinity construction and spectral clustering procedure, but also handle data points lying in nonlinear subspaces.
3 Subspace Clustering(SC) Networks
Our subspace clustering networks leverage on the properties of deep convolutional autoencoder and the subspaces clustering. In this section we will discuss the subspace property and the whole framework in detail.
3.1 Subspace Clustering
Consider a collection of points belonging to a union of subspaces of dimensions , respectively ^{2}^{2}2We assume in the remainder.. With slight abuse of notation, we will use to represent the basis of the subspace index by , that is and with denoting identity matrix. The goal of subspace clustering is to learn the subspaces and assign points to their nearest subspaces. Once every data point is assigned to a subspace, the corresponding subspace basis can be recalculated by SVD (will be shown shortly). Different from selfexpressivenessbased methods which obtain the affinity matrix by solving largescale optimization problems, SC seeks to minimize the sum of residuals of points to their nearest subspaces. The cost function of SC can be written as
(1) 
Given the subspace basis , the optimal value for can be written as
(2) 
For the sake of discussion, let us arrange into a membership matrix . Beginning with an initialization of candidate subspaces bases, SC updates the membership assignments and subspaces in an alternating fashion: 1) cluster points by assigning the nearest subspace as in Eqn. (2
); 2) reestimate the new subspace bases by performing SVD on the points of each cluster (the columns of
where the ith row is 1). Similar tomeans, the whole algorithm works in an Expectation Maximization (EM) style, and is guaranteed to converge to a local minimum in a finite number of iterations. We will shortly show how stochastic optimization techniques can be applied to minimize the problem depicted in (
1), equipping our solution with the ability to handle largescale data.3.2 SC with Convolutional AutoEncoder Network
Denoising fullyconnected AutoEncoders (AEs) are widely used with generic clustering algorithms [37, 35]. We have found such structures difficult to train (due to the large number of parameters in the fullyconnected layers) and propose to use convolutional AEs to learn the embeddings for SC.
Specifically, let denote the AE parameters, which can be decomposed into encoder parameters and decoder parameters . Let be the encoder mapping function and
as the decoder mapping function, both of which are composed of a sequence of convolution kernels and nonlinear activation functions. Our overall loss can be written as
(3) 
where is a regularization parameter to balance the reconstruction loss and the ksubspace clustering loss. The autoencoder reconstruction loss is defined as
(4) 
The is the loss for subspace clustering and is written as
(5) 
where denotes the Grassmann manifold consisting of dimensional subspaces with ambient dimension .
As a preprocessing step, some of traditional algorithms such as [40, 41] use PCA to project images onto a lowdimensional space. However, the mapping by PCA projection is linear and fixed. By contrast, our encoder function can update its parameters to adapt to a space which is subspaceclusteringfriendly.
4 Optimization
The cost function (3) is highly nonconvex and three sets of variables (i.e., , and ) should be updated alternatively. It is known that alternating optimization problems are not without difficulties. A strategy such as wakeandsleep is a common practice to update one set of variables while fixing the others. As mentioned before, we first pretrain a CAE without having any information about and . Therefore, it is natural to obtain an initial state for and directly from the output of the pretrained CAE. This is exactly how we initialize and .
As shown in Fig (1), the gradient of the encoder comes from the loss of reconstruction and the loss of subspace clustering loss, i.e.,
(6) 
By fixing , the assignments for a minibatch can be obtained easily and the required gradient for updating the CAE follows by backpropagating the error. The most difficult part in our problem is to find a way to update the subspaces efficiently and accurately. Here we will explain two approaches to update the subspaces. The first method is based on the SVD decomposition and the second one makes use of the Riemannian geometry of Grassmannian to update the subspaces
4.1 SVD Update
Although SVD decomposition is computationally more expensive, we empirically observe that the SVD can provide satisfactory results. In our optimization, we update the encoder through backpropagation, batch by batch, and update the subspaces by employing the SVD once per epoch. This is mainly because updating subspaces more frequently hinders the convergence. Intuitively, if the gradient takes the network to a bad direction, updating subspaces accordingly could intensify the negativity and worsen the CAE. Empirically, we observe updating subspaces after every epoch can neutralize the good and the bad directions of the gradient, yielding a stable framework.
The outliers may affect the subspace clustering badly, especially for subspace clustering. Therefore, when updating each subspace, we rule out the farthest points as outliers. That is, after back propagation on CAE, we pass all the data through the encoder and assign their membership. We then sort the distance between each sample and the subspace it belongs to, and remove the outliers. Finally, we apply SVD on the remainder of points assigned to a subspace to obtain its new basis. Note that we only need to compute the largest
singular values and corresponding vectors to update a subspace. Specifically, after fixing
and in Eqn. (5), updating the subspace basis translates to solving the following problem(7) 
where consists of (as columns) that belong to cluster . The solution to (7) corresponds to the column space , which can be obtained by applying SVD on and taking the top left singular vectors.
4.2 Gradient based update
If more frequent updates are required, the SVD solution can be replaced by a Riemannian gradient descent method based on the geometry of Grassmannian. In particular, let be the gradient of the loss with respect to after an iteration (or accumulated gradient after a few iterations). In Riemannian optimization, is updated according to the following rule;
(8) 
We explain Eqn. (8) with the aid of Fig. 2. First we note that a global coordinate system on a Riemannian manifold cannot be defined. As such Riemannian techniques make extensive use of the tangent bundle of the manifold to achieve their goal. Note that moving in the direction of will take us off the manifold. For a Riemannian manifold embedded in a Euclidean space (our case here), an ambient vector such as can be projected orthogonally on the tangent space at the current solution . We denote this operator by in Eqn. (8). The resulting tangent vector shown by the green arrow in Fig. 2 identifies a geodesic on the manifold. Moving along this geodesic (sufficiently) will guarantee to decrease the loss while preserving the orthogonality of the solution. In Riemannian optimization, this is achieved by a retraction which is local approximation to the exponential map on the manifold. We denote the retraction in Eqn. (8) by . The only remaining bit is which is the learning rate. For the Grassmannian, we have
(9)  
(10) 
In Eqn. (10),
is the Q part of the QR decomposition which is much faster than SVD. Although the SVD can perform good enough in experiments, we provide the other method which is faster in order to deal with very large datasets.
5 Experiment
We use Tensorflow
[1] to build our networks. We used MNIST dataset [19] in our first experiment. MNIST is not considered as a standard dataset for previous subspace clustering algorithms, since the size of this dataset is far beyond the size that traditional algorithms can handle. In addition, the original images do not follow the structure of linear subspaces. Taking advantage of CAE with our subspace clustering module, we aim to project all the MNIST data into a space which is more friendly for subspace clustering. In order to enforce our conclusion, we also evaluate our method on FashionMNIST dataset [33], a similar dataset to MNIST but with fashion images. FashionMNIST has 10 classes, with image being gray scale and of size . The images of FashionMNIST come from the fashion products which are classified based on a certain assortment and manually labeled by inhouse fashion experts and reviewed by a separate team. It contains more variations within each class and it is thus more challenging compared to MNIST.5.0.1 Baseline Methods
For most of the baselines and our method, we evaluate them on the whole datasets of MNIST and FashionMNIST with all 70000 images (including both training and testing sets). We compare our solution with the following generic clustering algorithms:
1) Means [22]: means finds clusters based on spatial closeness. As an EM method, it heavily relies on good initialization. Hence, for means (and other means based methods), we run the algorithm 20 times with different centroid seeds and report the best result.
2) Deep Embedded Clustering (DEC) [35]: A rich structure for the MNIST dataset is proposed in [35]
which we follow here. In particular, stacked autoencoder(SAE)
[4]along layerwise pretraining was considered. The structure of the network reads as 784500500200010. Image brightness is scaled from 01 to 05.2 to boost the performance. We observe that this method is highly sensitive to network parameters in the sense that even a small change in the structure will result in a significant performance drop. However, the feature extracted by the pretrained model is very discriminative,
i.e., even simply using kmeans on top of it can achieve competitive results. We call the feature extracted by this network the SEA features in the sequel.3) Deep Clustering Network (DCN) [37]: Based on the vanilla SAE, Yang et al.propose to add means clustering loss in addition to the data reconstruction loss of SAE.
4) Stacked AutoEncoder followed by Means (SAEKM): Extract features with SAE followed by applying means.
5) PCA followed by subspace (PCAKS): It projects the original data onto a lowdimensional space at first, then use subspace to obtain the final results. Since PCA is a linear projection, it helps the readers to understand where the improvements come from compared to our nonlinear projection. The results are reported based on the 10 trails due to the randomness of initialization when employing subspace.
6) Convolutional AutoEncoder followed by Means (CAEKM): Extract features with SAE and then apply means. This is also the initialization for our method. It also can be considered as an evaluation of the quality of our initialization.
For those subspace clustering algorithms that rely on affinity matrix construction and spectral clustering, since they are not scalable to the whole dataset, we can report their results on the test sets (with 10000 images) only. We list several stateoftheart subspace clustering algorithms for baselines: Sparse Subspace Clustering (SSC) [8], Low Rank Representation (LRR) [21], Kernel Sparse Subspace Clustering (KSSC) [26], SSC by Orthogonal Matching Pursuit(SSCOMP) [40] and the latest one Deep Subspace Clustering Networks (DSCNet) [15].
5.0.2 Evaluation Metric
For all quantitative evaluations, we make use of the unsupervised clustering accuracy rate, defined as
(11) 
where is the groundtruth label, is the subspace assignment produced by the algorithm, and ranges over all possible onetoone mappings between subspaces and labels. The mappings can be efficiently computed by the Hungarian algorithm. We also use normalized mutual information (NMI) as the additional quantitative standard. NMI scales from 0 to 1, where a smaller value means less correlation between predict label and ground truth label. Another quantitative metric is the adjusted Rand index (ARI), which is scaled between 1 and 1. The larger the ARI, the better the clustering performance.
5.0.3 Implementation
We build our CAE in a bottleneck structure, meaning we decrease the number of channels and the size of feature maps layer by layer. We design a six layer convolutional autoencoder, where the kernel size in the first layer is and in the last two layers of the encoder is . We set the number of channels in each layer to
for the encoder, and the reverse for the decoder since they are symmetric in structure. Between layers, we set the stride to 2 in both horizontal and vertical directions, and use rectified linear unit (ReLU) as the nonlinear activations. We use the same structure for both MNIST and FashionMNIST datasets.
Instead of greedy layerwise pretraining [37, 35], we pretrained our network endtoend from random initialization, until the reconstructed images are similar to the input ones (200 epochs suffice for pretraining). For subspaces initialization, we randomly sampled 2000 images and use DSC network to generate the clusters and corresponding subspaces. We noticed that initialization by the DSC subspaces leads to a model that underperforms in the beginning as compared to the Means algorithm. Nevertheless, our algorithm successfully recovers from such an initialization in all the experiments. During the optimization we use Adam [18] optimizer, an adaptive momentum based gradient descent method, to minimize the loss, where we set the learning rate to in both our pretraining and finetuning stages. For different datasets, the only two parameters need tuning are the in (3) and the subspace ambient dimension , since the subspace intrinsic dimension is fixed by the number of feature map of CAE.
5.1 MNIST Dataset
In this section, we will report and discuss results on the MNIST dataset. To the best of our knowledge, existing subspace clustering methods, with raw images as input, have not achieved satisfactory results on this dataset. As far as we know, the best performance reported in [27] is in the range , where the DSIFT features are employed.
On MNIST, we fix our subspace dimension as 7, which means each subspace lies on a Grassmannian manifold . The is set to 0.08, which balances between subspace clustering and CAE data reconstruction. Table (1) reports the results of all the baselines, including both subspace clustering algorithms and generic clustering algorithms. SCNS is to update the subspaces by employing the SVD decomposition, and SCNG stands for updating the subspaces by the Grassmannian gradients, which empirically is not as stable as the SVD updating scheme, probably due to the stochastic nature of each gradient step.This Grassmannian update, however, runs faster and takes less time to converge. We run our methods 15 times and report the average. The results of DEC are taken from the original paper. We tune the parameters for DCN very carefully and report the best results.
Among all the algorithms, our algorithm achieves the best performance in ACC and ARI. Especially for ACC, ours is higher than the second best, namely DEC. From the results, it is not difficult to conclude that the DEC and DCN perform only marginally better than SAEKM, which is the initialization for DEC and DCN. Specifically, DEC improvements over the initialization are around and DCN only boosts around over SAEKM. By contrast, our method starts from CAEKM (with ACC), and improves it by to ACC. The improvement can be visualized by Fig (3), which shows the projections of CAE feature space and the latent space of our network in a twodimensional space. Compared to CAE features, which are all mixed up, our latent space are well separated even though the twodimensional space is not suitable for visualizing subspace structure as they reside in highdimensional ambient space.
For traditional subspace clustering algorithms, around 37 Gigabytes of memory is required to store the affinity matrix, which is computationally prohibitive. Therefore, we contrast our algorithm against SSC, LRR, KSSC, SSCOMP and Deep Subspace Clustering Networks on a smaller experiment, namely only using the 10000 test images of the MNIST dataset (see Table. (2) for results). Note that SSCOMP completely fails in dealing with feature generated by SAE and CAE, achieving around ACC and NMI. Generally speaking, with more samples, better accuracies are expected. We can see that all the subspace clustering algorithms using the SAE feature perform better compared to using CAE feature. To some extent, it proves that there exists a nonlinear mapping which is more favorable to subspace clustering. At the same time, our algorithm still achieves the best results within all subspace clustering algorithms, even higher that DSCNet.
SAEKM  CAEKM  Kmeans  PCAKS  DEC  DCN  SCNG  SCNS  
ACC  81.29%  51 %  53%  68.53%  84.3%  83.31%  82.22%  87.14% 
NMI  73.78 %  44.87%  50 %  64.17%  80%  80.86%  73.93%  78.15% 
ARI  67%  33.52 %  37 %  54.17%  75%  74.87%  71.10%  75.81 % 
MNIST  FashionMNIST  
ACC  NMI  ACC  NMI  
SSCSAE  75.49%  66.26%  52.33 %  51.26% 
SSCCAE  43.03 %  56.81%  35.31%  18.10% 
LRRSAE  74.09%  66.97%  58.09%  59.19% 
LRRCAE  51.37%  66.59%  34.43%  18.57% 
KSSCSAE  81.53%  84.53%  57.10%  60.40% 
KSSCCAE  56.42%  65.66%  35.41%  18.18% 
DSCNet  53.20%  47.90%  55.81%  54.80 % 
SCNS  83.30%  77.38%  60.02%  62.30% 
5.2 FashionMNIST
Unlike MNIST dataset, which only contains simple digits, every class in FashionMNIST has different styles and come from different gender groups: men, women, kids and neutral. In FashionMNIST, there are 60000 training images and 10000 test images. In our case, we pretrained and finetuned the network using the whole dataset. On FashionMNIST, we fix our subspace dimension to 11 and set to 0.11.
Consistent with the MNIST dataset, the DCN sightly improves upon its initialization (SAEKM) in terms of ACC and NMI. Moreover, we find out that the DCN algorithm works better with smaller learning rates, which in turn requires more epochs to converge properly. From Table (3), we can see that our method still improves the accuracy by compared to our initialization, and outperforms other algorithms. The tSNE maps in Fig. (4) show that there exists a subspace structure in our latent space even in two dimensional space.
Table (2) shows that the subspace clustering algorithms also achieve acceptable results on the 10000 test sets, with our algorithm being the best among all. Compared to other subspace clustering algorithms, our algorithm runs much faster, only requiring less than 8 minutes(including pretraining and fine tuning with subspace clustering) to generate final results, whereas the traditional algorithms need at least 40 minutes to process these 10000 samples even after the dimensionality reduction.
SAEKM  CAEKM  Kmeans  PCAKS  DCN  SCNG  SCNS  
ACC  54.35%  39.84 %  47.58%  53.41%  56.14%  58.67%  63.78% 
NMI  58.54 %  39.80%  51.24 %  57.5%  59.4%  52.88%  62.04% 
ARI  41.86%  25.93 %  34.86 %  41.17%  43.04%  42%  48.04% 
5.3 Further Discussion
Based on the above experiments, we observe that our algorithm consistently achieves higher accuracies as compared to DCN (even with the initialization using CAE). One may argue that the performance gain over DCN is due to the fact that unlike SAE, CAE can be trained easily^{3}^{3}3In our experiments, the number of parameters in SAE is 2600 times more than that of CAE.. To verify that this is not the case, we replace the SAE with the CAE in DCN to see whether DCN can generate competitive results. Table (4) demonstrates that even with the CAE, the DCN cannot boost the clustering results as much as ours. On MNIST, DCNCAE can hardly improve the accuracy and NMI; on FashionMNIST, it can increase the accuracy more than 3 percent (and NMI around 1 percent). This can be attributed to the loss introduced by means in DCN, compared to our subspace clustering loss which we believe is more robust. In other words, the subspace structure could be more desirable than cluster centroids in high dimensional space.
MNIST  FashionMNIST  

ACC  NMI  ACC  NMI  
DCNCAE  51.10 %  45.18 %  45.64 %  47.8% 
Initilization  50.98%  44.87 %  42.38 %  46.75 % 
6 Conclusions
In this paper, we proposed a scalable deep subspace clustering algorithm, which combined the subspace clustering and convolutional autoencoder in a principle way. Our algorithm makes it possible to scale subspace clustering algorithms to large datasets. Furthermore, we proposed two efficient and robust schemes to update the subspaces. These allow our SC networks to iteratively fit every sample into its corresponding subspace and update the subspaces accordingly, even from a bad initialization (as observed in our experiments).
Our extensive experiments on MNIST and FashionMNIST dataset demonstrated that our deep subspace clustering method provides significant improvements over various stateoftheart subspace clustering solutions in terms of clustering accuracy and efficiency.
References
 [1] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Largescale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016)
 [2] Agarwal, P.K., Mustafa, N.H.: Kmeans projective clustering. In: Proceedings of the twentythird ACM SIGMODSIGACTSIGART symposium on Principles of database systems. pp. 155–165. ACM (2004)
 [3] Balzano, L., Szlam, A., Recht, B., Nowak, R.: Ksubspaces with missing data. In: Statistical Signal Processing Workshop (SSP), 2012 IEEE. pp. 612–615. IEEE (2012)
 [4] Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layerwise training of deep networks. In: NIPS. pp. 153–160 (2007)
 [5] Bradley, P.S., Mangasarian, O.L.: Kplane clustering. Journal of Global Optimization 16(1), 23–32 (2000)
 [6] Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR 2005. pp. 886–893. IEEE (2005)
 [7] Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: CVPR. pp. 2790–2797 (2009)
 [8] Elhamifar, E., Vidal, R.: Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. on Pattern Analysis and Machine Intelligence 35(11), 2765–2781 (2013)
 [9] Ho, J., Yang, M.H., Lim, J., Lee, K.C., Kriegman, D.: Clustering appearances of objects under varying illumination conditions. In: CVPR. vol. 1, pp. 11–18. IEEE (2003)
 [10] Ji, P., Li, H., Salzmann, M., Dai, Y.: Robust motion segmentation with unknown correspondences. In: ECCV. pp. 204–219. Springer (2014)
 [11] Ji, P., Li, H., Salzmann, M., Zhong, Y.: Robust multibody feature tracker: a segmentationfree approach. In: CVPR. pp. 3843–3851 (2016)
 [12] Ji, P., Reid, I., Garg, R., Li, H., Salzmann, M.: Lowrank kernel subspace clustering. arXiv preprint arXiv:1707.04974 (2017)

[13]
Ji, P., Salzmann, M., Li, H.: Efficient dense subspace clustering. In: IEEE Winter Conf. on Applications of Computer Vision (WACV). pp. 461–468. IEEE (2014)
 [14] Ji, P., Salzmann, M., Li, H.: Shape interaction matrix revisited and robustified: Efficient subspace clustering with corrupted and incomplete data. In: ICCV. pp. 4687–4695 (2015)
 [15] Ji, P., Zhang, T., Li, H., Salzmann, M., Reid, I.: Deep subspace clustering networks. In: Advances in Neural Information Processing Systems. pp. 23–32 (2017)
 [16] Ji, P., Zhong, Y., Li, H., Salzmann, M.: Null space clustering with applications to motion segmentation and face clustering. In: (ICIP). pp. 283–287. IEEE (2014)
 [17] Kanatani, K.i.: Motion segmentation by subspace separation and model selection. In: ICCV. vol. 2, pp. 586–591. IEEE (2001)
 [18] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
 [19] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradientbased learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

[20]
Lee, K.C., Ho, J., Kriegman, D.J.: Acquiring linear subspaces for face recognition under variable lighting. TPAMI
27(5), 684–698 (2005)  [21] Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by lowrank representation. IEEE Trans. on Pattern Analysis and Machine Intelligence 35(1), 171–184 (2013)
 [22] Lloyd, S.: Least squares quantization in pcm. IEEE transactions on information theory 28(2), 129–137 (1982)
 [23] Lowe, D.G.: Distinctive image features from scaleinvariant keypoints. IJCV 60(2), 91–110 (2004)
 [24] Ma, Y., Derksen, H., Hong, W., Wright, J.: Segmentation of multivariate mixed data via lossy data coding and compression. TPAMI 29(9) (2007)

[25]
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. In: NIPS. vol. 14, pp. 849–856 (2001)
 [26] Patel, V.M., Vidal, R.: Kernel sparse subspace clustering. In: ICIP. pp. 2849–2853. IEEE (2014)
 [27] Peng, X., Feng, J., Xiao, S., Lu, J., Yi, Z., Yan, S.: Deep sparse subspace clustering. arXiv preprint arXiv:1709.08374 (2017)
 [28] Peng, X., Xiao, S., Feng, J., Yau, W.Y., Yi, Z.: Deep subspace clustering with sparsity prior. In: IJCAI (2016)
 [29] Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)
 [30] Tseng, P.: Nearest qflat to m points. Journal of Optimization Theory and Applications 105(1), 249–252 (2000)
 [31] Vidal, R.: Subspace clustering. IEEE Signal Processing Magazine 28(2), 52–68 (2011)
 [32] Vidal, R., Favaro, P.: Low rank subspace clustering (LRSC) 43, 47–61 (2014)

[33]
Xiao, H., Rasul, K., Vollgraf, R.: Fashionmnist: a novel image dataset for benchmarking machine learning algorithms (2017)
 [34] Xiao, S., Tan, M., Xu, D., Dong, Z.Y.: Robust kernel lowrank representation. IEEE transactions on neural networks and learning systems 27(11), 2268–2281 (2016)
 [35] Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International conference on machine learning. pp. 478–487 (2016)
 [36] Yang, A.Y., Wright, J., Ma, Y., Sastry, S.S.: Unsupervised segmentation of natural images via lossy data compression. CVIU 110(2), 212–225 (2008)
 [37] Yang, B., Fu, X., Sidiropoulos, N.D., Hong, M.: Towards kmeansfriendly spaces: Simultaneous deep learning and clustering. In: ICML. pp. 3861–3870 (2017)
 [38] Yin, M., Guo, Y., Gao, J., He, Z., Xie, S.: Kernel sparse subspace clustering on symmetric positive definite manifolds. In: CVPR. pp. 5157–5164 (2016)
 [39] You, C., Li, C.G., Robinson, D.P., Vidal, R.: Oracle based active set algorithm for scalable elastic net subspace clustering. In: CVPR. pp. 3928–3937 (2016)
 [40] You, C., Robinson, D., Vidal, R.: Scalable sparse subspace clustering by orthogonal matching pursuit. In: CVPR. pp. 3918–3927 (2016)
 [41] Zhang, T., Szlam, A., Wang, Y., Lerman, G.: Hybrid linear modeling via local bestfit flats. International journal of computer vision 100(3), 217–240 (2012)
Comments
There are no comments yet.