Introduction
In recent years, learning multiview data has increasingly attracted research attention in many realworld applications, because data represented by different features or collected from different sources are very common. For instance, documents can have different languages; web pages can be described by different characteristics, e.g., hyperlinks and texts; images can have many descriptions with respect to different kinds of features like color or texture features. Different views or features can capture distinct perspectives of data that are complementary to each other. Thus, how to integrate these heterogeneous features and uncover the underlying structure of data is a critical problem for multiview learning. In this paper, we focus on an unsupervised scenario, i.e., multiview spectral clustering.
In the past decades, various spectral clustering algorithms have been proposed [Shi and Malik2000, Ng et al.2002, ZelnikManor and Perona2005, Von Luxburg2007, Nie et al.2014, Chang et al.2015]. These methods can achieve promising clustering performance for an individual view. However, multiple views containing different information can describe the data more accurately and improve the clustering performance. [Zhou and Burges2007] generalize the singleview spectral clustering normalized cut to the multiview case. [Blaschko and Lampert2008] introduce the Canonical Correlation Analysis (CCA) to map multiview data into a lowdimensional subspace. There are also some methods using cotraining or coregularization strategies to integrate different information of views [Kumar and Daumé2011, Kumar et al.2011]. In addition, [Cai et al.2011] integrate heterogeneous features to learn a shared Laplacian matrix and improve model robustness with a nonnegative constraint. [Wang et al.2014] utilize the minimax optimization to obtain a universal feature embedding and a consensus clustering result. [Nie et al.2017] simultaneously perform local structure learning and multiview clustering in which the weight is automatically determined for each view. Recently, selfrepresentation subspace based multiview spectral clustering methods have been developed due to the effectiveness [Cao et al.2015, Gao et al.2015, Yin et al.2015, Zhang et al.2015, Wang et al.2017]. These methods aim to discover underlying subspaces embedded in original data for clustering accurately.
Although the previous multiview spectral clustering methods can achieve promising performance, there still exist drawbacks. First, spectral methods need the highquality similarity matrix. The previous methods directly learn the similarity matrix utilizing original data. However, in realworld datasets, data often contain noises and outliers, thus the similarity matrix learned from original data is unreliable. Second, different views have different contributions to data clustering. The previous methods use the Euclidean metric to learn the similarity matrix. For given data, the Euclidean distance among them is fixed, which cannot consider different contributions of views. Finally, for multiview spectral clustering, the kmeans procedure in spectral clustering requires the strict initialization, which influences the final clustering performance [Ng et al.2002].
In this paper, we propose a novel subspace based multiview spectral clustering method, named Multiview Subspace Clustering unifying Adaptive neighbours and Metric learning (MSCAM) to address the aforementioned problems. In this method, we learn the subspace representations of original data for each view. By utilizing these subspace representations to adaptively learn a consensus similarity matrix, we can alleviate the influence of noises and outliers. Meanwhile, for each view, we learn the most suitable Mahalanobis matrix to parameterize the squared distance. The motivation is that due to the complexity of noises and outliers, different views have different contributions to clustering data. Thus we propose to use Mahalanobis metric to dynamically rescale data of each view. Different Mahalanobis matrixes are learned to weigh different contributions of views. Finally, we constrain the graph constructed by the similarity matrix to have exact connected components. Here, the number of clusters is . In this way, the learned graph can be employed to cluster directly without the kmeans procedure.
The main contributions of our work are as follows:

We adaptively learn a consensus similarity matrix in the subspace rather than the original space that may have noises and outliers.

Mahalanobis metric is employed to parameterize the squared distance of each view, which considers the contributions of different views to data clustering compared with the Euclidean metric.

We add a constraint on the graph constructed by the similarity matrix to replace the kmeans procedure.

Extensive comparison experiments demonstrate that our MSCAM method outperforms other stateoftheart multiview clustering approaches.
Related Work
Notation Summary
Lowercase letters denote scalars while bold lowercase letters
denote vectors. Bold uppercase letters
mean matrixes. For an arbitrary matrix , means the column of and stands for the element in . and denote the transpose and trace of , respectively. and represent the norm and Frobenius norm, respectively. For two matrixes with the same size, represents the inner product. Moreover, andrepresent the vectors of all ones and identity matrix with proper sizes, respectively.
Adaptive Neighbours Clustering
For clustering tasks, the local correlation of original data plays an important role. Recently, many clustering methods considering the local correlation have been developed [Nie et al.2014, Guo2015, Nie et al.2016, Zhao et al.2016]. Let be the data matrix with data points, where is the dimension of features. The Euclidean (squared) distance is used as a measure to decide the knearest data of each data point. For each data point , all data points can be the neighbour of
with the probability
. Generally, a smaller distance indicates that a larger probability should be allocated. Therefore, the probabilities can be determined by solving the following problem(1) 
where is a vector, whose element is .
For problem (1), to rule out the trivial solution, where only the nearest data point of is assigned probability 1 and the similarity of all the other points would be probability 0 , a penalty term is added to constrain the probability . For all the data points, the model with adaptive neighbours is
(2) 
where is the tradeoff parameter. After obtaining the similarity matrix , the spectral clustering [Ng et al.2002] can be performed to get final clustering results.
Multiview Subspace Clustering (MSC)
Subspace clustering aims to obtain a similarity matrix in the learned underlying subspace of original data and perform spectral clustering [Lu et al.2012, Elhamifar and Vidal2013, Liu et al.2013].
In the dataset, each data point can be reconstructed by an effective combination of other points, i.e., , where is the subspace representation matrix and is the error matrix.
Usually, the subspace clustering model can be written in the following form
(3) 
where and denote the error loss term and the regularized term, respectively. is the tradeoff parameter. We can obtain the subspace representation where the nonzero elements mean that the corresponding data points are from the same subspace. Then the similarity matrix can be constructed. Afterwards, the spectral clustering [Ng et al.2002] is performed on the similarity matrix to obtain clustering results.
Recently, subspace clustering is extended to the multiview setting because of its effectiveness. Generally, these multiview subspace clustering models can be formulated as
(4) 
where , , and ( is the dimension of features for the view) denote the data matrix, the subspace representation and the error matrix for the view (, is the number of views), respectively. However, these spectral clustering methods construct graphs from original data that are corrupted and require strict initialization, which results in suboptimal results. In contrast, we construct the graph in the underlying subspace of original data. We also learn different Mahalanobis matrixes considering the contributions of different views and remove the kmeans procedure from spectral clustering.
Methodology
In this section, we present the proposed MSCAM method, which utilizes subspace representations and Mahalanobis metric to adaptively learn a consensus similarity matrix. In addition, we add a constraint on the constructed graph to optimize spectral clustering.
MSC with Adaptive Neighbours
For multiview data, denotes the original data of the view. We extend the basic subspace clustering model to multiview domains as follows.
(5) 
where is the regularization parameter. is the learned subspace representation for each view. In our objective function, the Frobenius norm constraint on can improve the model robustness according to [Lu et al.2012].
Adaptive neighbours explore the local correlation of original data to improve the clustering performance. Moreover, subspace representations can uncover the underlying subspace structure of the original data and alleviate the influence of noises and outliers in original data. Therefore, we utilize the subspace representations rather than original data to learn a consensus similarity matrix for all views as
(6) 
where , , and are three tradeoff parameters. For the data point , is the subspace representation in the view. For all views, we learn a consensus similarity matrix with adaptive neighbours. Therefore, the data points in each view can be allocated to the most suitable cluster and the consistency of clustering results across views can be preserved. We name the method Multiview Subspace Clustering with Adaptive Neighbours as MSCAN.
Joint Adaptive Neighbours and Metric Learning
According to MSCAN, we employ the subspace representations to adaptively learn a consensus similarity matrix, which alleviates the influence of noises and outliers in original data to some extent. However, due to the complexity of noises and outliers for multiview data, different views may have different contributions to clustering. The MSCAN model utilizes the Euclidean distance as the metric to learn the similarity matrix. Thus for given subspace representations, the similarity among them will not change. That results in a suboptimal graph. By contrast, we use Mahalanobis metric to learn the similarity matrix for two reasons. For one thing, Mahalanobis metric aims to learn the Mahalanobis matrix that can parameterize the squared distance [Xing et al.2003]. Moreover, learning the Mahalanobis matrix is equivalent to learning a rescaling of data, which indicates that the similarity among subspace representations will adaptively change until obtaining the satisfactory similarity matrix. For another, the Mahalanobis matrix can be decomposed into the form of matrix product, i.e., , which makes our model easy to solve. Our goal is to utilize the subspace representations to adaptively learn the similarity matrix. Therefore, considering the contributions of different views, we employ the Mahalanobis metric to rescale the subspace representations of different views, which changes the similarity relationships among subspace representations.
Mahalanobis metric Learning
First, we introduce the Mahalanobis distance based metric learning. Given the dataset , consider learning the Mahalanobis metric of the form
(7) 
where is a positive semidefinite Mahalanobis matrix. Setting , the formula (7) is the Euclidean distance; if let be diagonal, it means that different weights are assigned to different axes in the metric learning; more usually, can parameterize the squared distances. In addition, the Mahalanobis metric learning can rescale the data point to . Then, a normal Euclidean metric learning can be employed in terms of these rescaled data points. The effectiveness of Mahalanobis metric learning has been verified in clustering [Xing et al.2003] and classification [Weinberger and Saul2009, Cao et al.2013]. In our paper, we employ the Mahalanobis distance as a metric since it can improve the clustering performance by well weighing the contributions of different views.
Mahalanobis metric Induced MSC
Considering the contributions of different views, we learn different Mahalanobis matrixes in the underlying subspace for different views. Therefore, the Mahalanobis metric induced objective function can be formulated as
(8) 
where , and is a positive semidefinite Mahalanobis matrix. Usually, can be decomposed as , where and . In this way, Mahalanobis metric learning can be seen as finding a linear projection . Therefore, we can rewrite the model (8) as
(9) 
In this model, the orthogonal constraint on actually aims to learn a linear projection to transfer original dimensional subspace representations into a dimensional uncorrelated space.
Graph Constraint Term
After obtaining the similarity matrix , a spectral clustering method [Ng et al.2002] is performed to get final clustering results. However, due to the kmeans procedure, spectral clustering depends on the initialization that influences the final clustering performance.
Inspired by [Nie et al.2017], the graph obtained via shares exact connected components. As a result, it yields explicit clustering results with the similarity matrix . We first introduce Theorem 1 as follows.
Theorem 1. In the graph obtained by the similarity matrix , the number of connected components is equal to the multiplicity
of 0 as the eigenvalue of the (nonnegative) Laplacian matrix
.Therefore, the graph with connected components means that the Laplacian matrix (, where ) should have zero eigenvalues. According to the theorem in [Fan1949], we let the dimension of the projection be () and have
(10) 
where and . Hence, we obtain our final multiview subspace clustering objective function
(11) 
The graph constraint term leads to explicit clustering results without the kmeans procedure, which improves the final clustering performance.
Optimization
The model (11) is not jointly convex, thus we solve the model (11) by iteratively optimizing each variable (, , and ) while fixing other variables. In addition, we offer the convergence analysis.
Optimization Procedure
Fixing , Update and
When is fixed, the model (11) can be formulated as
(12) 
For convenience, we ignore the subscript tentatively and rewrite the above formula as
(13) 
To minimize the problem (13), we set and , then utilize the Alternating Direction Method of Multipliers (ADMM) algorithm [Boyd et al.2011] to optimize. The augmented Lagrangian function is given by
(14) 
where and are Lagrange multiplier matrixes and is the penalty parameter.
1) Update . We solve the following problem to update
(15) 
whose solution is given by
(16) 
where .
2) Update . We solve the following problem to update
(17) 
whose solution is given by
(18) 
where .
3) Update . We update with the following problem
(19) 
thus we obtain
4) Update . We update with the following problem
(20) 
which is equivalent to the following problem:
(21) 
where and . First, let , where
is a lower triangular matrix. Then, we perform the Singular Value Decomposition (SVD) on
and the result is denoted as . Therefore, we get(22) 
5) Update and . For the Lagrange multipliers, we can update them as
(23) 
Fixing and , Update
When and are fixed, the model (11) can be formulated as
(24) 
For the problem (24), due to the independence between different , we can individually solve the following problem for each as
(25) 
which is equivalent to the following form
(26) 
where . is a vector, and the element in is . Then, the above problem can be rewritten as
(27) 
The Lagrangian function of the above problem is given by
(28) 
where and are Lagrangian multipliers. For each data point , we set the number of the nearest neighbours as . So we can obtain the optimal solution of according to KarushKuhnTucker (KKT) condition
(29) 
In the formula (29), we make , ,…, be sorted in the ascending order. Then, to let most elements of have exact nonzero elements, we have
(30) 
Hence, we can determine the final parameter by computing the average of [Nie et al.2017]
(31) 
Convergence Analysis
The original problem (11) is not jointly convex and can be divided into two optimization subproblems. Each of them is the convex minimization problem and we can obtain the optimum solution for each subproblem. Therefore, the original objective function is nonincreasing with the iterations until Algorithm 2 converges.
Experiments
In this section, we evaluate our proposed MSCAN and MSCAM methods on a synthetic dataset and three realworld benchmark datasets to demonstrate their effectiveness.
Dataset Descriptions
The descriptions of the datasets used in our experiments are summarized in Table 1.
Synthetic dataset is constructed according to [Gao et al.2016]
. Specifically, this synthetic dataset is comprised of three views with two clusters. Each view is generated from a twocomponent Gaussian mixture model and 1,000 data points are sampled as instances in each view. Finally, we use noises to uniformly corrupt the data points with the percentage
of entries at random, e.g., we choose a columnto corrupt by adding the Gaussian noise (zero mean and variance
) to its observed vector.Oxford Flowers dataset is composed of 1,360 examples that consist of 17 flower categories. Each category is composed of 80 images. Different features (color, texture, shape) are used to describe each image. In addition, distance matrixes for three different visual features (color, texture, shape) are utilized to construct three views.
Handwritten numerals (HW) dataset contains 2,000 examples from 0 to 9 digit classes. Each class has 200 examples. We use six public features including 240dimension pixel averages in
windows (PIX), 216dimension profile correlations (FAC), 76dimension Fourier coefficients of character shapes (FOU), 47dimension Zernike moment (ZER), 64dimension Karhunenlove coefficients (KAR) and 6dimension morphological (MOR) features.
NUSWIDEObject (NUS) dataset is a web image dataset for object recognition. It consists of 30,000 images in 31 categories. In our experiments, we randomly extract 100 images for each category. We use six lowlevel features to describe the image: 64dimension color histogram (CH), 225dimension blockwise color moments (CM), 144dimension color correlation (CORR), 73dimension edge direction histogram (EDH), 128dimension wavelet texture (WT) and 500dimension BoW SIFT.
Experiment Setup
We evaluate MSCAN and MSCAM by comparing with other stateoftheart clustering methods. Specifically, we compare with the singleview clustering methods: Spectral Clustering (SC) [Ng et al.2002] and Sparse Subspace Clustering (SSC) [Elhamifar and Vidal2013]. The multiview clustering methods: Coregularized Spectral Clustering (CoReg) [Kumar et al.2011], MultiModal Spectral Clustering (MMSC) [Cai et al.2011], Diversityinduced Multiview Subspace Clustering (DiMSC) [Cao et al.2015], and Multiview Learning with Adaptive Neighbours (MLAN) [Nie et al.2017], are also utilized as comparison methods. The detailed information of these comparison approaches is described as follows.

SCBSV: SC is a classic spectral clustering method. In our experiments, we perform the SC method on each singleview feature and report the best results.

SSCBSV: SSC is a representative subspace clustering method based on selfrepresentation. In this method, subspace representation is obtained at first, and then spectral clustering method is performed on the subspace representation. In our experiments, we employ the SSC method for each single view and report the best results.

SCConcat: We first concatenate all features into a long vector and then perform SC to get final clustering results.

SSCConcat: Same as SCConcat, we perform SSC to get the final clustering results of concatenated features.

CoReg: This method introduces a centroidbased coregularization term to make all of views have the same clustering results.

MMSC: This method learns a shared Laplacian matrix by integrating multiview heterogeneous image features. In addition, a nonnegative constraint is utilized to improve robustness of this model.

DiMSC: This method utilizes the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term to explore the complementarity information of multiview data.

MLAN: This approach simultaneously performs multiview clustering and local structure learning. Moreover, the weight for each view is automatically determined without additional penalty parameters.
Datasets  #.Size  #.View  #.Cluster 
Synthetic Dataset  1000  3  2 
Oxford Flowers  1360  3  17 
Handwritten  2000  6  10 
NUSWIDEObject  3100  6  31 
There are three tradeoff parameters , , and in our model, where the parameter can be determined according to the property of adaptive neighbours [Nie et al.2017]. Therefore, to obtain the best clustering results, we only need to adjust parameters and . In addition, for each data point, the number of the nearest neighbours is set as 9. For all compared methods, we also tune their parameters to obtain the best results. Besides, we conduct experiments for 10 times to get the average results for each dataset. All experiments are performed by MATLAB tool on computer with Intel Xeon E52650 v2CPU (2.6GHz) and 32G RAM.
Corruptions (%)  0  10  20  30  40  50  60  70  80  90 
MLAN  0.849  0.841  0.833  0.830  0.787  0.732  0.683  0.668  0.600  0.579 
MSCAN  0.890  0.841  0.836  0.832  0.813  0.805  0.763  0.758  0.748  0.702 
MSCAM  0.874  0.872  0.868  0.846  0.840  0.833  0.826  0.817  0.788  0.780 
Experiment Results on Synthetic Dataset
We compare MSCAN and MSCAM with the recently proposed MLAN [Nie et al.2017] that learns the similarity matrix with original data. Experimental results (accuracy) are shown in Table 2. It is noteworthy that the proposed MSCAN and MSCAM methods consistently outperform MLAN method. In addition, when the noise level is high (percentage is 90%), the clustering result of MLAN is very bad, i.e., 57.9%. In contrast, MSCAN and MSCAM can still achieve promising clustering performance, i.e., 70.2% and 78.0%. Further, MSCAN has a better clustering result than MSCAM on the original synthetic dataset. However, MSCAM consistently outperforms MSCAN on the corrupted synthetic dataset. This indicates that the MSCAM method is more robust than MSCAN and MLAN.
Dataset  Oxford Flowers  HW  NUS 
SCBSV  0.411(0.003)  0.723(0.000)  0.131(0.002) 
SSCBSV  0.356(0.008)  0.767(0.004)  0.147(0.001) 
SCConcat  0.428(0.012)  0.752(0.000)  0.142(0.002) 
SSCConcat  0.365(0.006)  0.815(0.009)  0.152(0.001) 
CoReg  0.433(0.011)  0.804(0.059)  0.188(0.007) 
MMSC  0.442(0.013)  0.840(0.011)  0.154(0.005) 
DiMSC  0.431(0.021)  0.907(0.004)  0.109(0.003) 
MLAN  0.459(0.001)  0.973(0.000)  0.104(0.002) 
MSCAN  0.527(0.004)  0.975(0.000)  0.179(0.005) 
MSCAM  0.530(0.004)  0.978(0.001)  0.190(0.002) 
Clustering ACC (mean and standard deviation) of different methods. The best results are in bold font.
Dataset  Oxford Flowers  HW  NUS 
SCBSV  0.426(0.005)  0.667(0.000)  0.042(0.003) 
SSCBSV  0.373(0.005)  0.759(0.001)  0.013(0.000) 
SCConcat  0.434(0.010)  0.709(0.000)  0.089(0.002) 
SSCConcat  0.410(0.005)  0.850(0.007)  0.031(0.001) 
CoReg  0.423(0.008)  0.778(0.035)  0.209(0.006) 
MMSC  0.446(0.004)  0.892(0.008)  0.172(0.004) 
DiMSC  0.443(0.011)  0.841(0.003)  0.102(0.004) 
MLAN  0.476(0.002)  0.939(0.001)  0.091(0.003) 
MSCAN  0.540(0.002)  0.942(0.000)  0.206(0.003) 
MSCAM  0.522(0.002)  0.948(0.001)  0.214(0.002) 
Experiment Results on Benchmark Datasets
In our experiments, we utilize accuracy (ACC) and normalized mutual information (NMI) as two evaluation metrics to evaluate clustering methods. Table
3 and Table 4 show the clustering results on three benchmark datasets, respectively. In general, the multiview clustering methods can achieve superior results than singleview approaches (SC and SSC). Additionally, we can observe that the proposed MSCAM method achieves the best clustering performance in comparison with other stateoftheart multiview clustering methods (CoReg, MMSC, DiMSC, and MLAN).For HW dataset, the recently proposed MLAN method achieves high performance, so the improvement of our MSCAM method is not apparent. However, for the large NUS dataset, MSCAM significantly improves the clustering performance compared with MLAN. This is because that MLAN constructs the graph by utilizing original data that have noises and outliers for a large dataset. By contrast, MSCAM obtains a better graph by learning different subspace representations and Mahalanobis matrixes for different views, and thus achieves a better clustering performance.
Additionally, to better evaluate MSCAM, we further report the clustering performance of MSCAN and MSCAM. We can observe that MSCAN can achieve competitive or even better clustering results than MSCAM for smallscale datasets (Oxford Flowers and HW). However, due to the complexity of noises and outliers in the large NUS dataset, MSCAM outperforms MSCAN. This is because that the unchangeable similarity relationships among subspace representations result in the suboptimal performance in MSCAN. MSCAM learns different Mahalanobis matrixes for all views, which can change the similarity among subspace representations, and further obtains a better result. Consequently, the proposed MSCAM is robust and can achieve superior performance than other clustering algorithms.
Parameter Sensitivity
In our method, there are two parameters and . Figure 1 shows the parameter sensitivity of MSCAM on HW dataset. It is obvious that our proposed method is not very sensitive to parameters and can achieve satisfactory clustering performance within a large range of parameter values ( and ).
Conclusion
In this paper, we propose a novel multiview subspace clustering method MSCAM which joints adaptive neighbours and metric learning. Our method learns subspace representations of original data and uses them to adaptively learn a consensus similarity matrix. Meanwhile, considering different contributions of views, we utilize Mahalanobis metric to learn different projections. Further, we add a graph constraint to remove the kmeans procedure. We develop an iterative optimization algorithm for MSCAM. Extensive experimental results on a synthetic dataset and three realworld datasets demonstrate that our MSCAM is robust and outperforms other stateoftheart clustering algorithms.
References

[Blaschko and
Lampert2008]
Matthew B Blaschko and Christoph H Lampert.
Correlational spectral clustering.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 1–8, 2008. 
[Boyd et al.2011]
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends® in Machine Learning
, 3(1):1–122, 2011.  [Cai et al.2011] Xiao Cai, Feiping Nie, Heng Huang, and Farhad Kamangar. Heterogeneous image feature integration via multimodal spectral clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1977–1984, 2011.

[Cao et al.2013]
Qiong Cao, Yiming Ying, and Peng Li.
Similarity metric learning for face recognition.
In Proceedings of the IEEE International Conference on Computer Vision, pages 2408–2415, 2013.  [Cao et al.2015] Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. Diversityinduced multiview subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586–594, 2015.

[Chang et al.2015]
Xiaojun Chang, Feiping Nie, Zhigang Ma, Yi Yang, and Xiaofang Zhou.
A convex formulation for spectral shrunk clustering.
In
Proceedings of AAAI Conference on Artificial Intelligence
, pages 2532–2538, 2015.  [Elhamifar and Vidal2013] Ehsan Elhamifar and Rene Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765–2781, 2013.

[Fan1949]
Ky Fan.
On a theorem of weyl concerning eigenvalues of linear transformations i.
Proceedings of the National Academy of Sciences, 35(11):652–655, 1949.  [Gao et al.2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multiview subspace clustering. In Proceedings of the IEEE International Conference on Computer Vision, pages 4238–4246, 2015.
 [Gao et al.2016] Hang Gao, Yuxing Peng, and Songlei Jian. Incomplete multiview clustering. In Proceedings of the International Conference on Intelligent Information Processing, pages 245–255, 2016.

[Guo2015]
Xiaojie Guo.
Robust subspace segmentation by simultaneously learning data representations and their affinity matrix.
In Proceedings of the International Joint Conference on Artificial Intelligence, pages 3547–3553, 2015.  [Kumar and Daumé2011] Abhishek Kumar and Hal Daumé. A cotraining approach for multiview spectral clustering. In Proceedings of the International Conference on Machine Learning, pages 393–400, 2011.
 [Kumar et al.2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Coregularized multiview spectral clustering. In Proceedings of Advances in Neural Information Processing Systems, pages 1413–1421, 2011.
 [Liu et al.2013] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by lowrank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):171–184, 2013.
 [Lu et al.2012] Canyi Lu, Hai Min, Zhongqiu Zhao, Lin Zhu, Deshuang Huang, and Shuicheng Yan. Robust and efficient subspace segmentation via least squares regression. In Proceedings of the European Conference on Computer Vision, pages 347–360, 2012.

[Ng et al.2002]
Andrew Y Ng, Michael I Jordan, and Yair Weiss.
On spectral clustering: Analysis and an algorithm.
In Proceedings of Advances in Neural Information Processing Systems, pages 849–856, 2002.  [Nie et al.2014] Feiping Nie, Xiaoqian Wang, and Heng Huang. Clustering and projected clustering with adaptive neighbors. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 977–986, 2014.
 [Nie et al.2016] Feiping Nie, Xiaoqian Wang, Michael I Jordan, and Heng Huang. The constrained laplacian rank algorithm for graphbased clustering. In Proceedings of AAAI Conference on Artificial Intelligence, pages 1969–1976, 2016.
 [Nie et al.2017] Feiping Nie, Guohao Cai, and Xuelong Li. Multiview clustering and semisupervised classification with adaptive neighbours. In Proceedings of AAAI Conference on Artificial Intelligence, pages 2408–2414, 2017.
 [Shi and Malik2000] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
 [Von Luxburg2007] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
 [Wang et al.2014] Hongxing Wang, Chaoqun Weng, and Junsong Yuan. Multifeature spectral clustering with minimax optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4106–4113, 2014.
 [Wang et al.2017] Xiaobo Wang, Xiaojie Guo, Zhen Lei, Changqing Zhang, and Stan Z Li. Exclusivityconsistency regularized multiview subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 923–931, 2017.
 [Weinberger and Saul2009] Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(1):207–244, 2009.
 [Xing et al.2003] Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng. Distance metric learning with application to clustering with sideinformation. In Proceedings of Advances in Neural Information Processing Systems, pages 521–528, 2003.
 [Yin et al.2015] Qiyue Yin, Shu Wu, Ran He, and Liang Wang. Multiview clustering via pairwise sparse subspace representation. Neurocomputing, 156:12–21, 2015.
 [ZelnikManor and Perona2005] Lihi ZelnikManor and Pietro Perona. Selftuning spectral clustering. In Proceedings of Advances in Neural Information Processing Systems, pages 1601–1608, 2005.

[Zhang et al.2015]
Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, and Xiaochun Cao.
Lowrank tensor constrained multiview subspace clustering.
In Proceedings of the IEEE International Conference on Computer Vision, pages 1582–1590, 2015.  [Zhao et al.2016] Handong Zhao, Hongfu Liu, and Yun Fu. Incomplete multimodal visual data grouping. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 2392–2398, 2016.
 [Zhou and Burges2007] Dengyong Zhou and Christopher JC Burges. Spectral clustering and transductive learning with multiple views. In Proceedings of the International Conference on Machine Learning, pages 1159–1166, 2007.