1 Introduction
Many realworld data have representations in multiple feature types and modalities. For example, images can be represented by different types of features such as SIFT or HOG, and web pages usually consist of hyperlinks and texts. Each such representation is usually referred to as a view of a multiview dataset. As different views are often collected from diverse domains using different feature extractors, they may contain information complementary to other views. As such, multiview data can provide richer and more relevant information comparing to singleview data. Accordingly, multiview data processing has become a popular topic in machine learning and pattern recognition
[Xu et al.2016, Sun et al.2015, Cao et al.2015, Zhao and Fu2015].Multiview clustering is an unsupervised multiview data analysis problem, which aims to group a multiview dataset into distinct clusters. One common strategy for multiview clustering is to learn a consensus representation that can reflect the latent clustering structure shared by different views [Liu et al.2016, Gao et al.2015, Xu et al.2015, Xu et al.2013, Liu et al.2013b]. As multiview data are often collected from diverse domains and the statistical properties among views may vary, different views may have different confidence levels when pursuing a consensus representation (here the confidence level of a view or a sample refers to its ability to uncover the underlying clustering structures). For example, a view that potentially has good clustering structures should have higher confidence level and viceversa. Existing works usually address this by assigning distinctive weights to different views (e.g., assign larger weights to views that have higher confidence levels), either manually or automatically [Xu et al.2016, Wang et al.2014, Wang et al.2013b, Xia et al.2010]
. However, due to the complexity and noisy nature of multiview data in real world applications, the confidence levels of samples in the same view may also vary (e.g., outliers or noisy samples should have lower confidence level comparing to the uncorrupted ones in the same view). Thus considering a unified weight for a view may result in suboptimal solutions.
In this paper, we propose a novel robust localized multiview subspace clustering model that considers the confidence levels of both samples and views when learning a consensus representation. Specifically, we introduce a nonnegative weight parameter to each sample under each view. By assigning weights to samples under each view properly (e.g., assign smaller weights to samples with lower confidence levels), we can obtain a robust consensus representation via fusing the noiseless structures among different views and samples. Meanwhile, as we usually do not have prior information about samples and views confidence levels in unsupervised situation, it may be impractical to manually assign weight to each sample, especially for large dataset. We thus design a regularizer on weight parameters based on the convex conjugacy theory, and sample weights are adapted during the optimization. We then develop an efficient iterative algorithm to solve this problem, and perform extensive experiments on four benchmarks to demonstrate the correctness and effectiveness of the proposed method.
The main contributions of our work are: (1) We propose to consider the confidence levels of both views and samples when learning a consensus representation for multiview clustering. (2) We design a regularizer on samples weights so that they are adaptively assigned during the optimization process. The learned weights can reflect samples confidence levels to some extent. The whole formulation is easy to optimize and the convergence is guaranteed. (3) The proposed model outperforms the related stateoftheart multiview clustering methods on four realworld multiview datasets.
2 Related Work
Recently, many spectralbased subspace clustering methods have been proposed [Hu et al.2014, Feng et al.2014, He et al.2014, Elhamifar and Vidal2013, Liu et al.2013a, Zhang et al.2013]. These approaches first learn a similarity matrix
based on data’s selfrepresentation. They then obtain clustering results by applying spectral clustering algorithm like
[Shi and Malik2000] on the learned similarity matrix. For example, Sparse Subspace Clustering (SSC) [Elhamifar and Vidal2013] constructs the similarity matrix through finding a sparse linear representation of each sample. LowRank Representation (LRR) [Liu et al.2013a] emphasizes lowrank property on data’s selfrepresentation during the optimization. Though proved to be effective in many applications, these methods mainly focus on singleview data.To explore the complementary information involved in multiview data, several multiview clustering algorithms have been proposed [Lu et al.2016, Li et al.2016, Wang et al.2013a, Kumar et al.2011, Kumar and Daumé2011]. For example, [Kumar et al.2011] propose a coregularization framework to regularize the difference between views Laplacian embedding. [Liu et al.2013b] propose a multiview nonnegative matrix factorization framework to pursue a consensus representation. To extend singleview subspace clustering algorithm to multiview situation, [Cao et al.2015] propose a diversityinduced multiview subspace model that explicitly enforces the diversity of different representations. [Gao et al.2015] propose to learn a consistence clustering structure and the subspace representation of each view simultaneously. Instead of considering each view equally, [Xia et al.2010] propose to learn an optimal weighting to linearly combine different views representations. [Wang et al.2014] develop a minimax optimization framework to minimize the loss of the worst case with maximum disagreements. However, these methods usually learn a unified weight for a view and do not consider the confidence level variations of samples in the same view. Recently, [Gönen and Margolin2014] propose a localized multiple kernel learning framework that considers samplespecific weights to different kernels. However, the kernels are prespecified in their work and they further constrain that the weights should sum to one for each sample. In contrast, we propose to learn samples weights and consensus representation jointly and develop a novel weighting strategy based on convex conjugacy theory.
3 The Proposed Method
3.1 Terms and Notations
We use , , , to denote Frobeniusnorm, the norm (sum of absolute value), the norm and the norm (sum of of the norm of columns of a matrix). takes the absolute value of the elements in a matrix. We denote
. For a vector
, denotes a square diagonal matrix with elements of on its main diagonal, and is its th entry. For a matrix , is the vector formed from its main diagonal elements. We use and to denote the th column and th row of matrix , respectively. denotes the entry of at th row and th column.3.2 Proposed Formulation
In this section, we will introduce a novel approach to extend the singleview subspace clustering model to multiview situation. Specifically, we will demonstrate our main ideas based on SSC model due to its good interpretability and effectiveness [Elhamifar and Vidal2013]. Given a multiview dataset with views sampled from clusters, is the number of total samples. is the feature matrix corresponding to the th view and is its feature, . Singleview SSC model corresponding to th view can be described as
(1) 
where is the learned sparse selfrepresentation of th view and is the selfrepresentation of th sample under th view, is a nonnegative parameter that trades off the reconstruction error and the sparse constraint . After getting the optimal , is used to build a similarity matrix, and the final clustering results of th view is obtained by applying spectral clustering on .
To explore the complementary information contained in multiview data, a direct approach is to perform singleview subspace clustering model (1) on each view separately and then build similarity matrix as . However, this naïve extension cannot make use of the complementary information in multiview data very well as 1) it considers each view independently and 2) it simply treats all views and samples in an equal manner. As multiview data are often collected from diverse domains by different types of feature extractors and noise is inevitable in realworld applications, different views usually have different confidence levels and the confidence levels of samples in the same view may also vary (e.g., outliers or noisy samples should have low confidence levels). Thus naïve treating each view equally or considering a unified weight for a view may both lead to suboptimal solutions. To better explore the complementary information in multiview data, we propose a robust localized multiview subspace clustering model to consider the confidence levels of both views and samples as
(2)  
where is the learned sparse consensus representation, and are two nonnegative tradeoff parameters. In model (2), we explicitly introduce a nonnegative weight parameter for each sample under each view to reflect its confidence level. denotes the weight matrix and is the weight of th sample under th view. By properly assigning weight to each sample under each view (e.g., assign small weights to outliers and noisy samples), in model (2) can better explore the complementary information contained in each . As we usually do not have prior information about the confidence levels of samples and views in unsupervised setting, it is impractical to obtain sample weights manually. Thus we further incorporate the updating of into the model optimization with a regularizer on , and is adapted during the model optimization.
3.3 Discussion of
One key issue in model (2) is to design a proper regularizer , which determines the updating of weight parameters. In order to obtain a robust consensus representation among views and samples, outliers or noisy samples under each view should be assigned with smaller weights and samples with good local structures should obtain relative larger weights. Denote the loss of th sample under th view as . Model (2) with respect to becomes
(3) 
Based on the convex conjugacy theory [Boyd and Vandenberghe2004], we have
Lemma 1 Problem (3) is related to a certain latent function , where denotes the convex conjugate of function ^{1}^{1}1 The convex conjugate of function is defined as [Boyd and Vandenberghe2004]. .
Proof. Based on the convex conjugacy theory, we have . is concave due to the fact that the convex conjugate is convex.
This on the other side encourages us to design regularizer based on the latent loss functions. Given a concave loss function , we can define , and accordingly. For instance, we can define as
(4) 
where is a nonnegative hyperparameter, is a convex function for . Its corresponding latent loss function is . By substituting eq (4) into problem (3), the optimal weight for th sample under th view in (3) is calculated as
(5) 
where is named as minimizer function. Figure 1 gives a graphic illustration of and . We can tell that is monotone decreasing with respect to loss
, and samples with larger losses will get smaller weights. As outliers or noisy samples are usually considered to be deviated from the major normal ones, they can usually cause larger reconstruction loss and will get smaller weights accordingly. Meanwhile, samples with good local structures probably can be reconstructed very well and will get larger weights consequently. Through this, the learned weight matrix
can reflect the confidence levels of samples under each view. Its correctness and effectiveness are further demonstrated in the experimental part. By substituting eq (4) into model (2), we obtain the proposed robust multiview subspace clustering model.4 Optimization
4.1 Optimization Algorithm
Although problem (2) is not jointly convex for all variables, it is convex with respect to each variable while fixing the others. Thus we can develop an efficient block coordinate descent algorithm for it. The overall algorithm is summarized in Algorithm 1. Specifically, the variables , and are updated as follows:
step
Update weight matrix with fixed .
According to the discussion in Section 3.3, the optimizing of each refers to a convex optimization problem (3) and it has a closed form solution as
(6) 
where , .
step
Update each while fixing . The optimization of is equivalent to
(7)  
Its Lagrange function is
(8) 
where is Lagrange multiplier and denotes the inner product of two vectors. By setting the derivative of with respect to and to zero, we have
(9) 
Therefore, can be obtained as
(10) 
where is a identity matrix and .
step
Update while fixing . Problem (2) becomes
(11) 
It obtains a unique solution as
(12) 
where , .
Seen from eq (11) and eq (12), the optimal is a weighted average over each view’s selfrepresentation with a sparse constraint. As we have discussed in Section 3.3, outliers or noisy samples will get smaller weights at step. For th sample, let denote its weight under each view, we can have the following observations: 1) if th sample obtains good local structures and can be reconstructed very well under each view, will get large values for all its entries and the selfrepresentation under each view (i.e., ) will all have large influence when calculating the consensus representation . 2) if th sample behaves abnormally (e.g., as outlier) only on some views, the corresponding weights in for these views will be very small and is thus mainly determined by the selfrepresentations learned from other views with high confidence levels. 3) if th sample behaves as outliers on all the views, will get very small values for all its entries, and the learned will be close to zero due to the sparse constraint in eq (11). Therefore, we can get a robust consensus representation through fusing the noiseless structures among views and samples.
4.2 Convergence and Computational Complexity
We first analyze the convergence property of Algorithm 1. During each iteration, an convex minimization problem is solved for each step, step and step, and the global optimum solution is obtained for each substep. Thus the overall objective is nonincreasing with the iterations and Algorithm 1 is guaranteed to convergence.
For a given multiview dataset and a fixed parameter in Algorithm 1, the matrix inverse operator and operation in eq (10) only need to be calculated once during the optimization, and their time cost is . In each iteration, is computed with the cost of . For step, there is an extra matrix multiplication operator in eq (10), which has a time cost of . The cost for computing is . Thus the total time cost of Algorithm 1 is , where T is the total iteration number. In our experiments, the objective value decreases very fast and the convergence can be reached within a small number of iterations.
5 Experiments
In this section, we evaluate the proposed model in comparison with several stateoftheart clustering methods on four realworld databases. Experimental results demonstrate the correctness and effectiveness of the proposed model.
Methods  Accuracy()  Normalized Mutual Information()  

Digit  Reuter  Animal  3Sources  Digit  Reuter  Animal  3Sources  
SCBSV  94.52 3.48  33.88 0.43  25.85 1.06  61.89 0.68  91.24 0.87  18.60 0.58  15.90 0.61  61.04 1.02 
MSC  94.68 2.80  30.49 3.34  28.39 1.04  65.59 3.62  90.34 1.15  21.87 1.18  19.06 0.84  68.23 3.69 
CoPairwise  96.16 0.02  31.14 2.75  25.96 0.67  61.48 0.54  91.89 0.03  21.69 0.96  16.08 0.50  60.14 1.54 
CoCentroid  96.65 0.00  29.42 2.83  29.52 1.69  65.44 2.60  93.71 0.00  21.44 0.85  19.28 0.89  64.99 1.98 
CoTraining  85.39 0.12  32.91 1.91  29.63 1.10  61.95 5.30  85.44 0.11  23.13 0.75  18.92 0.61  64.30 2.81 
SSCBSV  93.02 3.42  49.36 1.15  27.36 0.87  67.16 1.53  88.05 1.02  33.27 0.55  15.49 0.62  57.84 1.72 
SSCAVG  85.87 4.35  50.18 1.69  30.10 0.91  69.82 1.89  87.89 0.93  31.60 0.70  17.52 0.99  67.95 1.13 
DiMSC  90.79 3.71  53.01 1.24  31.83 0.68  65.47 0.98  84.46 1.33  39.20 0.21  19.29 0.59  60.23 2.30 
RMSCWV  95.35 0.00  55.05 1.06  31.87 0.66  74.56 1.80  90.65 0.00  37.70 0.43  19.70 0.61  67.17 1.95 
RMSC  97.91 0.03  57.50 0.39  33.14 1.01  78.37 0.52  94.98 0.08  40.80 0.20  20.02 0.64  70.53 0.71 
5.1 Databases
Four widely used realworld benchmarks are considered in the experiments. Their statistical information is summarized in Table 2.
Dataset  #.Instance  #.Views  #. Clusters 

Digit  2000  6  10 
Reuter  1200  5  6 
Animal  500  6  10 
3Sources  169  3  6 
UCI Handwritten Digit dataset ^{2}^{2}2https://archive.ics.uci.edu/ml/datasets/Multiple+Features:
This dataset is taken from UCI repository. It consists of 2000 handwritten digits classified into ten categories (09), and each category has 200 instances. Samples are represented in six kinds of features: pixel averages in 2 x 3 windows (PIX), Fourier coefficients of the character shapes (FOU), profile correlations (FAC), Zernike moments (ZER), KarhunenLove coefficients (KAR), and morphological features (MOR).
Reuter Multilingual dataset ^{3}^{3}3http://multilingreuters.iit.nrc.ca: It contains feature characteristics of documents written in five different languages (English, French, German, Spanish and Italian), and documents in different languages share the same 6 categories. We use documents originally in English as the first view and their French, German, Spanish and Italian translations as other four views. We randomly sample 1200 documents in a balanced manner, with 200 documents in each category.
Animal ^{4}^{4}4http://attributes.kyb.tuebingen.mpg.de/.: It consists of 50 kinds of animals, with 30475 images in total. The used six perextracted features are Color Histogram, Local SelfSimilarity, PyramidHOG (PHOG), SIFT, colorSIFT and SURF. Similar to [Yin et al.2015], we select the first ten categories and randomly sample 50 instances in each one as a subset for evaluation.
3Sources ^{5}^{5}5http://mlg.ucd.ie/datasets/3sources.html.: This dataset is collected from three wellknown online news sources: BBC, Reuters and The Guardian. There are 416 distinct news stories which are manually divided into six classes. Among them, 169 stores are reported in all three sources and are used in our experiments as in [Liu et al.2013b].
5.2 Baseline Algorithms and Experimental Setting
To better demonstrate the performance of the proposed model, we compare it with several stateoftheart methods.
SCBSV: Perform standard spectral clustering [Shi and Malik2000] on each singleview, and the best result is presented. SSCBSV: Run SSC [Elhamifar and Vidal2013] on each view independently, and the best result is reported. SSCAVG: Run SSC [Elhamifar and Vidal2013] on each view independently to get each view’s subspace representation, then perform spectral clustering on the averaged representation. MSC: A weighted multiview spectral clustering model [Xia et al.2010]. CoPairwise: A coregularization scheme that regularizes the Laplacian embedding to have high pairwise similarity [Kumar et al.2011]. CoCentroid: Another coregularization scheme [Kumar et al.2011] that regularizes viewspecific Laplacian embedding to be similar to a common consensus. CoTraining: A cotraining approach that alternately modifies one view’s graph structure with other views information [Kumar and Daumé2011]. DiMSC: A diversity induced multiview subspace clustering method which aims to reduce the redundancy between multiview representations [Cao et al.2015]. Focusing on the influence of different strategies to combine multiview representation, we use sparse constraint on each view’s selfrepresentation instead of the original smooth regularized term for better comparison. RMSC: The proposed robust localized multiview subspace clustering model (2). To better demonstrate the correctness and effectiveness of the proposed RMSC model, we further design a variation of model (2) that considers a unified weight for a view and it is named as RMSCWV. RMSCWV only distinguishes the confidence levels of views and samples in the same view will get same weights (details of its implementation are given in the appendix).
All samples are normalized to have unit L2 norm. Parameter in eq (4) is set to be
for all the dataset. As kmeans is applied in all the methods, we run it 20 times with random initialization and report both mean values and standard derivations. The Gaussian kernel and
nearest neighbor graphs [Von Luxburg2007] is used for methods that need to construct the Laplacian matrix of each view, is empirically set to be 5. Two commonly used metrics, i.e., clustering accuracy and normalized mutual information (NMI) [Chen et al.2011], are used as evaluation measures in this paper.5.3 Results and Parameter Analysis
Table 1 shows the numerical results of different methods on all the four databases. It is obvious that the proposed RMSC model outperforms all the compared algorithms and can improve the performance of multiview clustering. According to the clustering results of SSCAVG and SSCBSV, we can tell that naïve treat each view equally and average over each view’s representation cannot always boost the performance of single view clustering (e.g. SSCAVG is worse than SSCBSV on Digit). This is because that views with low confidence levels can have large negative influence on performance in the average operation. By taking into consideration of each view’s confidence level and pursuing a weighted average among views, RMSCWV can obtain consistent improvements on all the datasets over SSCBSV model. Moreover, the proposed RMSC model can further boost the performance of RMSCWV by simultaneously considering the confidence levels of both samples and views. This corroborates our analysis that naïve treating each view equally or assigning a unified weight to a view can both lead to suboptimal solutions and proves the effectiveness of the proposed RMSC model.
To better analyze the properties of the proposed model, we further report its clustering performance on Digit and Reuter with respect to different number of used views in Figure 2. SSCAVG and RMSCWV are also implemented for comparison. For Digit, the views are considered in an order of PIX, FOU, FAC, ZER, KAR, MOR. For example, {PIX, FOU} and {PIX, FOU, FAC} are used when and , respectively. Similarly, the views are considered in an order of English, French, German, Spanish and Italian for Reuter. Seen from Figure 2, RMSC obtains consistently better performance than SSCAVG and RMSCWV with respect to different number of used views on both datasets. More specifically, on Digit, as the number of used views increases, the performance of RMSC is more robust than that of SSCAVG and RMSCWV. For example, both the performance of RMSCWV and RMSC increases when incorporating the third view (FAC). However, when taking into consideration of the fourth view (ZER), the performance of SSCAVG and RMSCWV decreases a lot while that of RMSC is more stable. On Reuter, the performance of all the three algorithms increases as more views are included (except for the fifth view), and the RMSC always achieves the best performance. Thus RMSC is able to obtain robust consensus representation by considering the confidence levels of both views and samples and can improve the performance of multiview clustering.
Figure 3 shows the convergence performance of Algorithm 1 on Digit and Reuter. We can see that the overall objective value decreases very fast and the convergence can be reached within 5 iterations. To investigate the influence of parameter in RMSC, we further report its clustering results with respect to on Digit. The results are shown in Figure 4. We can observe that RMSC is not very sensitive to parameters and there exists a large parameter space of so that it can achieve promising results.
6 Conclusion
In this paper, we have proposed a novel robust localized multiview subspace clustering (RMSC) model that considers the confidence levels of both samples and views. RMSC aims to learn a consensus selfrepresentation for multiview data, and the proposed weighting strategy can reflect the samples confidence levels to some extent. We further develop an iterative optimization method for RMSC, and it converges quickly in few iterations. Comprehensive experimental results on four benchmark datasets show that RMSC can obtain a robust consensus representation and outperforms the stateoftheart multiview clustering algorithms.
7 Appendix
RMSCWV is a variation of model (2) and it considers a unified weight for each view (i.e., samples in the same view will be assigned same weight). The formulation of RMSCWV defined in Section 5.2 is
(13)  
where denotes the weight of th view and is a regularizer on . is defined in eq (4). Similar to model (2), block coordinate descent strategy is used for its optimization.
References
 [Boyd and Vandenberghe2004] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
 [Cao et al.2015] Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. Diversityinduced multiview subspace clustering. In CVPR, pages 586–594, 2015.
 [Chen et al.2011] WenYen Chen, Yangqiu Song, Hongjie Bai, ChihJen Lin, and Edward Y Chang. Parallel spectral clustering in distributed systems. TPAMI, 33(3):568–586, 2011.
 [Elhamifar and Vidal2013] Ehsan Elhamifar and Rene Vidal. Sparse subspace clustering: Algorithm, theory, and applications. TPAMI, 35(11):2765–2781, 2013.
 [Feng et al.2014] Jiashi Feng, Zhouchen Lin, Huan Xu, and Shuicheng Yan. Robust subspace segmentation with blockdiagonal prior. In CVPR, pages 3818–3825. IEEE, 2014.
 [Gao et al.2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multiview subspace clustering. In ICCV, pages 4238–4246, 2015.
 [Gönen and Margolin2014] Mehmet Gönen and Adam A Margolin. Localized data fusion for kernel kmeans clustering with application to cancer biology. In NIPS, pages 1305–1313, 2014.
 [He et al.2014] Ran He, Tieniu Tan, and Liang Wang. Robust recovery of corrupted lowrankmatrix by implicit regularizers. TPAMI, 36(4):770–783, 2014.
 [Hu et al.2014] Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. Smooth representation clustering. In CVPR, pages 3834–3841, 2014.
 [Kumar and Daumé2011] Abhishek Kumar and Hal Daumé. A cotraining approach for multiview spectral clustering. In ICML, pages 393–400, 2011.
 [Kumar et al.2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Coregularized multiview spectral clustering. In NIPS, pages 1413–1421, 2011.
 [Li et al.2016] Miaomiao Li, Xinwang Liu, Lei Wang, Yong Dou, Jianping Yin, and En Zhu. Multiple kernel clustering with local kernel alignment maximization. In IJCAI, volume 16, 2016.
 [Liu et al.2013a] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by lowrank representation. TPAMI, 35(1):171–184, 2013.
 [Liu et al.2013b] Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multiview clustering via joint nonnegative matrix factorization. In SIAM Data Mining Conference, pages 252–260, 2013.
 [Liu et al.2016] Xinwang Liu, Yong Dou, Jianping Yin, Lei Wang, and En Zhu. Multiple kernel kmeans clustering with matrixinduced regularization. In AAAI, pages 1888–1894, 2016.
 [Lu et al.2016] Canyi Lu, Shuicheng Yan, and Zhouchen Lin. Convex sparse spectral clustering: Singleview to multiview. TIP, 25(6):2833–2843, 2016.
 [Shi and Malik2000] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000.
 [Sun et al.2015] Jiangwen Sun, Jin Lu, Tingyang Xu, and Jinbo Bi. Multiview sparse coclustering via proximal alternating linearized minimization. In ICML, pages 757–766, 2015.
 [Von Luxburg2007] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
 [Wang et al.2013a] Hua Wang, Feiping Nie, and Heng Huang. Multiview clustering and feature learning via structured sparsity. In ICML, pages 352–360, 2013.
 [Wang et al.2013b] Xinchao Wang, Wei Bian, and Dacheng Tao. Grassmannian regularized structured multiview embedding for image classification. TIP, 22(7):2646–2660, 2013.
 [Wang et al.2014] Hongxing Wang, Chaoqun Weng, and Junsong Yuan. Multifeature spectral clustering with minimax optimization. In CVPR, pages 4106–4113, 2014.
 [Xia et al.2010] Tian Xia, Dacheng Tao, Tao Mei, and Yongdong Zhang. Multiview spectral embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 40(6):1438–1446, 2010.
 [Xu et al.2013] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multiview learning. arXiv preprint arXiv:1304.5634, 2013.
 [Xu et al.2015] Chang Xu, Dacheng Tao, and Chao Xu. Multiview selfpaced learning for clustering. In IJCAI, pages 3974–3980, 2015.
 [Xu et al.2016] Jinglin Xu, Junwei Han, and Feiping Nie. Discriminatively embedded kmeans for multiview clustering. In CVPR, pages 5356–5364, 2016.
 [Yin et al.2015] Qiyue Yin, Shu Wu, Ran He, and Liang Wang. Multiview clustering via pairwise sparse subspace representation. Neurocomputing, 156:12–21, 2015.
 [Zhang et al.2013] Yingya Zhang, Zhenan Sun, Ran He, and Tieniu Tan. Robust subspace clustering via halfquadratic minimization. In ICCV, pages 3096–3103, 2013.

[Zhao and Fu2015]
Handong Zhao and Yun Fu.
Dualregularized multiview outlier detection.
In IJCAI, pages 4077–4083, 2015.