Robust Localized Multi-view Subspace Clustering

05/22/2017 ∙ by Yanbo Fan, et al. ∙ University at Albany 0

In multi-view clustering, different views may have different confidence levels when learning a consensus representation. Existing methods usually address this by assigning distinctive weights to different views. However, due to noisy nature of real-world applications, the confidence levels of samples in the same view may also vary. Thus considering a unified weight for a view may lead to suboptimal solutions. In this paper, we propose a novel localized multi-view subspace clustering model that considers the confidence levels of both views and samples. By assigning weight to each sample under each view properly, we can obtain a robust consensus representation via fusing the noiseless structures among views and samples. We further develop a regularizer on weight parameters based on the convex conjugacy theory, and samples weights are determined in an adaptive manner. An efficient iterative algorithm is developed with a convergence guarantee. Experimental results on four benchmarks demonstrate the correctness and effectiveness of the proposed model.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many real-world data have representations in multiple feature types and modalities. For example, images can be represented by different types of features such as SIFT or HOG, and web pages usually consist of hyperlinks and texts. Each such representation is usually referred to as a view of a multi-view dataset. As different views are often collected from diverse domains using different feature extractors, they may contain information complementary to other views. As such, multi-view data can provide richer and more relevant information comparing to single-view data. Accordingly, multi-view data processing has become a popular topic in machine learning and pattern recognition

[Xu et al.2016, Sun et al.2015, Cao et al.2015, Zhao and Fu2015].

Multi-view clustering is an unsupervised multi-view data analysis problem, which aims to group a multi-view dataset into distinct clusters. One common strategy for multi-view clustering is to learn a consensus representation that can reflect the latent clustering structure shared by different views [Liu et al.2016, Gao et al.2015, Xu et al.2015, Xu et al.2013, Liu et al.2013b]. As multi-view data are often collected from diverse domains and the statistical properties among views may vary, different views may have different confidence levels when pursuing a consensus representation (here the confidence level of a view or a sample refers to its ability to uncover the underlying clustering structures). For example, a view that potentially has good clustering structures should have higher confidence level and vice-versa. Existing works usually address this by assigning distinctive weights to different views (e.g., assign larger weights to views that have higher confidence levels), either manually or automatically [Xu et al.2016, Wang et al.2014, Wang et al.2013b, Xia et al.2010]

. However, due to the complexity and noisy nature of multi-view data in real world applications, the confidence levels of samples in the same view may also vary (e.g., outliers or noisy samples should have lower confidence level comparing to the uncorrupted ones in the same view). Thus considering a unified weight for a view may result in suboptimal solutions.

In this paper, we propose a novel robust localized multi-view subspace clustering model that considers the confidence levels of both samples and views when learning a consensus representation. Specifically, we introduce a nonnegative weight parameter to each sample under each view. By assigning weights to samples under each view properly (e.g., assign smaller weights to samples with lower confidence levels), we can obtain a robust consensus representation via fusing the noiseless structures among different views and samples. Meanwhile, as we usually do not have prior information about samples and views confidence levels in unsupervised situation, it may be impractical to manually assign weight to each sample, especially for large dataset. We thus design a regularizer on weight parameters based on the convex conjugacy theory, and sample weights are adapted during the optimization. We then develop an efficient iterative algorithm to solve this problem, and perform extensive experiments on four benchmarks to demonstrate the correctness and effectiveness of the proposed method.

The main contributions of our work are: (1) We propose to consider the confidence levels of both views and samples when learning a consensus representation for multi-view clustering. (2) We design a regularizer on samples weights so that they are adaptively assigned during the optimization process. The learned weights can reflect samples confidence levels to some extent. The whole formulation is easy to optimize and the convergence is guaranteed. (3) The proposed model outperforms the related state-of-the-art multi-view clustering methods on four real-world multi-view datasets.

2 Related Work

Recently, many spectral-based subspace clustering methods have been proposed [Hu et al.2014, Feng et al.2014, He et al.2014, Elhamifar and Vidal2013, Liu et al.2013a, Zhang et al.2013]. These approaches first learn a similarity matrix

based on data’s self-representation. They then obtain clustering results by applying spectral clustering algorithm like

[Shi and Malik2000] on the learned similarity matrix. For example, Sparse Subspace Clustering (SSC) [Elhamifar and Vidal2013] constructs the similarity matrix through finding a sparse linear representation of each sample. Low-Rank Representation (LRR) [Liu et al.2013a] emphasizes low-rank property on data’s self-representation during the optimization. Though proved to be effective in many applications, these methods mainly focus on single-view data.

To explore the complementary information involved in multi-view data, several multi-view clustering algorithms have been proposed [Lu et al.2016, Li et al.2016, Wang et al.2013a, Kumar et al.2011, Kumar and Daumé2011]. For example, [Kumar et al.2011] propose a co-regularization framework to regularize the difference between views Laplacian embedding. [Liu et al.2013b] propose a multi-view nonnegative matrix factorization framework to pursue a consensus representation. To extend single-view subspace clustering algorithm to multi-view situation, [Cao et al.2015] propose a diversity-induced multi-view subspace model that explicitly enforces the diversity of different representations. [Gao et al.2015] propose to learn a consistence clustering structure and the subspace representation of each view simultaneously. Instead of considering each view equally, [Xia et al.2010] propose to learn an optimal weighting to linearly combine different views representations. [Wang et al.2014] develop a minimax optimization framework to minimize the loss of the worst case with maximum disagreements. However, these methods usually learn a unified weight for a view and do not consider the confidence level variations of samples in the same view. Recently, [Gönen and Margolin2014] propose a localized multiple kernel learning framework that considers sample-specific weights to different kernels. However, the kernels are pre-specified in their work and they further constrain that the weights should sum to one for each sample. In contrast, we propose to learn samples weights and consensus representation jointly and develop a novel weighting strategy based on convex conjugacy theory.

3 The Proposed Method

3.1 Terms and Notations

We use , , , to denote Frobenius-norm, the -norm (sum of absolute value), the -norm and the -norm (sum of of the norm of columns of a matrix). takes the absolute value of the elements in a matrix. We denote

. For a vector

, denotes a square diagonal matrix with elements of on its main diagonal, and is its -th entry. For a matrix , is the vector formed from its main diagonal elements. We use and to denote the -th column and -th row of matrix , respectively. denotes the entry of at -th row and -th column.

3.2 Proposed Formulation

In this section, we will introduce a novel approach to extend the single-view subspace clustering model to multi-view situation. Specifically, we will demonstrate our main ideas based on SSC model due to its good interpretability and effectiveness [Elhamifar and Vidal2013]. Given a multi-view dataset with views sampled from clusters, is the number of total samples. is the feature matrix corresponding to the -th view and is its feature, . Single-view SSC model corresponding to -th view can be described as

(1)

where is the learned sparse self-representation of -th view and is the self-representation of -th sample under -th view, is a nonnegative parameter that trades off the reconstruction error and the sparse constraint . After getting the optimal , is used to build a similarity matrix, and the final clustering results of -th view is obtained by applying spectral clustering on .

To explore the complementary information contained in multi-view data, a direct approach is to perform single-view subspace clustering model (1) on each view separately and then build similarity matrix as . However, this naïve extension cannot make use of the complementary information in multi-view data very well as 1) it considers each view independently and 2) it simply treats all views and samples in an equal manner. As multi-view data are often collected from diverse domains by different types of feature extractors and noise is inevitable in real-world applications, different views usually have different confidence levels and the confidence levels of samples in the same view may also vary (e.g., outliers or noisy samples should have low confidence levels). Thus naïve treating each view equally or considering a unified weight for a view may both lead to suboptimal solutions. To better explore the complementary information in multi-view data, we propose a robust localized multi-view subspace clustering model to consider the confidence levels of both views and samples as

(2)

where is the learned sparse consensus representation, and are two nonnegative trade-off parameters. In model (2), we explicitly introduce a nonnegative weight parameter for each sample under each view to reflect its confidence level. denotes the weight matrix and is the weight of -th sample under -th view. By properly assigning weight to each sample under each view (e.g., assign small weights to outliers and noisy samples), in model (2) can better explore the complementary information contained in each . As we usually do not have prior information about the confidence levels of samples and views in unsupervised setting, it is impractical to obtain sample weights manually. Thus we further incorporate the updating of into the model optimization with a regularizer on , and is adapted during the model optimization.

(a)

Latent loss function

(b) Minimizer function
Figure 1: Graphical representations of latent loss function and its corresponding minimizer function, .

3.3 Discussion of

One key issue in model (2) is to design a proper regularizer , which determines the updating of weight parameters. In order to obtain a robust consensus representation among views and samples, outliers or noisy samples under each view should be assigned with smaller weights and samples with good local structures should obtain relative larger weights. Denote the loss of -th sample under -th view as . Model (2) with respect to becomes

(3)

Based on the convex conjugacy theory [Boyd and Vandenberghe2004], we have

Lemma 1 Problem (3) is related to a certain latent function , where denotes the convex conjugate of function 111 The convex conjugate of function is defined as [Boyd and Vandenberghe2004]. .

Proof. Based on the convex conjugacy theory, we have . is concave due to the fact that the convex conjugate is convex.

This on the other side encourages us to design regularizer based on the latent loss functions. Given a concave loss function , we can define , and accordingly. For instance, we can define as

(4)

where is a nonnegative hyper-parameter, is a convex function for . Its corresponding latent loss function is . By substituting eq (4) into problem (3), the optimal weight for -th sample under -th view in (3) is calculated as

(5)

where is named as minimizer function. Figure 1 gives a graphic illustration of and . We can tell that is monotone decreasing with respect to loss

, and samples with larger losses will get smaller weights. As outliers or noisy samples are usually considered to be deviated from the major normal ones, they can usually cause larger reconstruction loss and will get smaller weights accordingly. Meanwhile, samples with good local structures probably can be reconstructed very well and will get larger weights consequently. Through this, the learned weight matrix

can reflect the confidence levels of samples under each view. Its correctness and effectiveness are further demonstrated in the experimental part. By substituting eq (4) into model (2), we obtain the proposed robust multi-view subspace clustering model.

4 Optimization

4.1 Optimization Algorithm

Although problem (2) is not jointly convex for all variables, it is convex with respect to each variable while fixing the others. Thus we can develop an efficient block coordinate descent algorithm for it. The overall algorithm is summarized in Algorithm 1. Specifically, the variables , and are updated as follows:

-step Update weight matrix with fixed . According to the discussion in Section 3.3, the optimizing of each refers to a convex optimization problem (3) and it has a closed form solution as

(6)

where , .

-step Update each while fixing . The optimization of is equivalent to

(7)

Its Lagrange function is

(8)

where is Lagrange multiplier and denotes the inner product of two vectors. By setting the derivative of with respect to and to zero, we have

(9)

Therefore, can be obtained as

(10)

where is a identity matrix and .

-step Update while fixing . Problem (2) becomes

(11)

It obtains a unique solution as

(12)

where , .

Seen from eq (11) and eq (12), the optimal is a weighted average over each view’s self-representation with a sparse constraint. As we have discussed in Section 3.3, outliers or noisy samples will get smaller weights at -step. For -th sample, let denote its weight under each view, we can have the following observations: 1) if -th sample obtains good local structures and can be reconstructed very well under each view, will get large values for all its entries and the self-representation under each view (i.e., ) will all have large influence when calculating the consensus representation . 2) if -th sample behaves abnormally (e.g., as outlier) only on some views, the corresponding weights in for these views will be very small and is thus mainly determined by the self-representations learned from other views with high confidence levels. 3) if -th sample behaves as outliers on all the views, will get very small values for all its entries, and the learned will be close to zero due to the sparse constraint in eq (11). Therefore, we can get a robust consensus representation through fusing the noiseless structures among views and samples.

0:  Multi-view data (), parameters , , number of clusters .
1:  Set , initialize sample weights as all one matrix, initialize each , .
2:  repeat
3:     Update via eq (12).
4:     Update via eq (6).
5:     Update each via eq (10), .
6:  until convergence.
7:  Construct data similarity matrix .
8:  Obtain clustering results by performing spectral clustering on similarity matrix .
Algorithm 1 : Algorithm for RMSC

4.2 Convergence and Computational Complexity

We first analyze the convergence property of Algorithm 1. During each iteration, an convex minimization problem is solved for each -step, -step and -step, and the global optimum solution is obtained for each sub-step. Thus the overall objective is non-increasing with the iterations and Algorithm 1 is guaranteed to convergence.

For a given multi-view dataset and a fixed parameter in Algorithm 1, the matrix inverse operator and operation in eq (10) only need to be calculated once during the optimization, and their time cost is . In each iteration, is computed with the cost of . For -step, there is an extra matrix multiplication operator in eq (10), which has a time cost of . The cost for computing is . Thus the total time cost of Algorithm 1 is , where T is the total iteration number. In our experiments, the objective value decreases very fast and the convergence can be reached within a small number of iterations.

5 Experiments

In this section, we evaluate the proposed model in comparison with several state-of-the-art clustering methods on four real-world databases. Experimental results demonstrate the correctness and effectiveness of the proposed model.

Methods Accuracy() Normalized Mutual Information()
Digit Reuter Animal 3-Sources Digit Reuter Animal 3-Sources
SC-BSV 94.52 3.48 33.88 0.43 25.85 1.06 61.89 0.68 91.24 0.87 18.60 0.58 15.90 0.61 61.04 1.02
MSC 94.68 2.80 30.49 3.34 28.39 1.04 65.59 3.62 90.34 1.15 21.87 1.18 19.06 0.84 68.23 3.69
Co-Pairwise 96.16 0.02 31.14 2.75 25.96 0.67 61.48 0.54 91.89 0.03 21.69 0.96 16.08 0.50 60.14 1.54
Co-Centroid 96.65 0.00 29.42 2.83 29.52 1.69 65.44 2.60 93.71 0.00 21.44 0.85 19.28 0.89 64.99 1.98
Co-Training 85.39 0.12 32.91 1.91 29.63 1.10 61.95 5.30 85.44 0.11 23.13 0.75 18.92 0.61 64.30 2.81
SSC-BSV 93.02 3.42 49.36 1.15 27.36 0.87 67.16 1.53 88.05 1.02 33.27 0.55 15.49 0.62 57.84 1.72
SSC-AVG 85.87 4.35 50.18 1.69 30.10 0.91 69.82 1.89 87.89 0.93 31.60 0.70 17.52 0.99 67.95 1.13
DiMSC 90.79 3.71 53.01 1.24 31.83 0.68 65.47 0.98 84.46 1.33 39.20 0.21 19.29 0.59 60.23 2.30
RMSC-WV 95.35 0.00 55.05 1.06 31.87 0.66 74.56 1.80 90.65 0.00 37.70 0.43 19.70 0.61 67.17 1.95
RMSC 97.91 0.03 57.50 0.39 33.14 1.01 78.37 0.52 94.98 0.08 40.80 0.20 20.02 0.64 70.53 0.71
Table 1: Clustering performance on four benchmark databases. The best results are highlighted in bold.

5.1 Databases

Four widely used real-world benchmarks are considered in the experiments. Their statistical information is summarized in Table 2.

Dataset #.Instance #.Views #. Clusters
Digit 2000 6 10
Reuter 1200 5 6
Animal 500 6 10
3-Sources 169 3 6
Table 2: Statistical information of databases.

UCI Handwritten Digit dataset 222https://archive.ics.uci.edu/ml/datasets/Multiple+Features:

This dataset is taken from UCI repository. It consists of 2000 handwritten digits classified into ten categories (0-9), and each category has 200 instances. Samples are represented in six kinds of features: pixel averages in 2 x 3 windows (PIX), Fourier coefficients of the character shapes (FOU), profile correlations (FAC), Zernike moments (ZER), Karhunen-Love coefficients (KAR), and morphological features (MOR).

Reuter Multilingual dataset 333http://multilingreuters.iit.nrc.ca: It contains feature characteristics of documents written in five different languages (English, French, German, Spanish and Italian), and documents in different languages share the same 6 categories. We use documents originally in English as the first view and their French, German, Spanish and Italian translations as other four views. We randomly sample 1200 documents in a balanced manner, with 200 documents in each category.

Animal 444http://attributes.kyb.tuebingen.mpg.de/.: It consists of 50 kinds of animals, with 30475 images in total. The used six per-extracted features are Color Histogram, Local Self-Similarity, PyramidHOG (PHOG), SIFT, colorSIFT and SURF. Similar to [Yin et al.2015], we select the first ten categories and randomly sample 50 instances in each one as a subset for evaluation.

3-Sources 555http://mlg.ucd.ie/datasets/3sources.html.: This dataset is collected from three well-known on-line news sources: BBC, Reuters and The Guardian. There are 416 distinct news stories which are manually divided into six classes. Among them, 169 stores are reported in all three sources and are used in our experiments as in [Liu et al.2013b].

5.2 Baseline Algorithms and Experimental Setting

To better demonstrate the performance of the proposed model, we compare it with several state-of-the-art methods.

SC-BSV: Perform standard spectral clustering [Shi and Malik2000] on each single-view, and the best result is presented. SSC-BSV: Run SSC [Elhamifar and Vidal2013] on each view independently, and the best result is reported. SSC-AVG: Run SSC [Elhamifar and Vidal2013] on each view independently to get each view’s subspace representation, then perform spectral clustering on the averaged representation. MSC: A weighted multi-view spectral clustering model [Xia et al.2010]. Co-Pairwise: A co-regularization scheme that regularizes the Laplacian embedding to have high pairwise similarity [Kumar et al.2011]. Co-Centroid: Another co-regularization scheme [Kumar et al.2011] that regularizes view-specific Laplacian embedding to be similar to a common consensus. Co-Training: A co-training approach that alternately modifies one view’s graph structure with other views information [Kumar and Daumé2011]. DiMSC: A diversity induced multi-view subspace clustering method which aims to reduce the redundancy between multi-view representations [Cao et al.2015]. Focusing on the influence of different strategies to combine multi-view representation, we use sparse constraint on each view’s self-representation instead of the original smooth regularized term for better comparison. RMSC: The proposed robust localized multi-view subspace clustering model (2). To better demonstrate the correctness and effectiveness of the proposed RMSC model, we further design a variation of model (2) that considers a unified weight for a view and it is named as RMSC-WV. RMSC-WV only distinguishes the confidence levels of views and samples in the same view will get same weights (details of its implementation are given in the appendix).

All samples are normalized to have unit L2 norm. Parameter in eq (4) is set to be

for all the dataset. As k-means is applied in all the methods, we run it 20 times with random initialization and report both mean values and standard derivations. The Gaussian kernel and

-nearest neighbor graphs [Von Luxburg2007] is used for methods that need to construct the Laplacian matrix of each view, is empirically set to be 5. Two commonly used metrics, i.e., clustering accuracy and normalized mutual information (NMI) [Chen et al.2011], are used as evaluation measures in this paper.

(a) Digit
(b) Reuter
Figure 2: Clustering performance on Digit and Reuter w.r.t number of used views.
(a) Digit
(b) Reuter
Figure 3: Convergence performance on Digit and Reuter.

5.3 Results and Parameter Analysis

Table 1 shows the numerical results of different methods on all the four databases. It is obvious that the proposed RMSC model outperforms all the compared algorithms and can improve the performance of multi-view clustering. According to the clustering results of SSC-AVG and SSC-BSV, we can tell that naïve treat each view equally and average over each view’s representation cannot always boost the performance of single view clustering (e.g. SSC-AVG is worse than SSC-BSV on Digit). This is because that views with low confidence levels can have large negative influence on performance in the average operation. By taking into consideration of each view’s confidence level and pursuing a weighted average among views, RMSC-WV can obtain consistent improvements on all the datasets over SSC-BSV model. Moreover, the proposed RMSC model can further boost the performance of RMSC-WV by simultaneously considering the confidence levels of both samples and views. This corroborates our analysis that naïve treating each view equally or assigning a unified weight to a view can both lead to suboptimal solutions and proves the effectiveness of the proposed RMSC model.

To better analyze the properties of the proposed model, we further report its clustering performance on Digit and Reuter with respect to different number of used views in Figure 2. SSC-AVG and RMSC-WV are also implemented for comparison. For Digit, the views are considered in an order of PIX, FOU, FAC, ZER, KAR, MOR. For example, {PIX, FOU} and {PIX, FOU, FAC} are used when and , respectively. Similarly, the views are considered in an order of English, French, German, Spanish and Italian for Reuter. Seen from Figure 2, RMSC obtains consistently better performance than SSC-AVG and RMSC-WV with respect to different number of used views on both datasets. More specifically, on Digit, as the number of used views increases, the performance of RMSC is more robust than that of SSC-AVG and RMSC-WV. For example, both the performance of RMSC-WV and RMSC increases when incorporating the third view (FAC). However, when taking into consideration of the fourth view (ZER), the performance of SSC-AVG and RMSC-WV decreases a lot while that of RMSC is more stable. On Reuter, the performance of all the three algorithms increases as more views are included (except for the fifth view), and the RMSC always achieves the best performance. Thus RMSC is able to obtain robust consensus representation by considering the confidence levels of both views and samples and can improve the performance of multi-view clustering.

Figure 3 shows the convergence performance of Algorithm 1 on Digit and Reuter. We can see that the overall objective value decreases very fast and the convergence can be reached within 5 iterations. To investigate the influence of parameter in RMSC, we further report its clustering results with respect to on Digit. The results are shown in Figure 4. We can observe that RMSC is not very sensitive to parameters and there exists a large parameter space of so that it can achieve promising results.

Figure 4: Clustering performance on Digit w.r.t. and

6 Conclusion

In this paper, we have proposed a novel robust localized multi-view subspace clustering (RMSC) model that considers the confidence levels of both samples and views. RMSC aims to learn a consensus self-representation for multi-view data, and the proposed weighting strategy can reflect the samples confidence levels to some extent. We further develop an iterative optimization method for RMSC, and it converges quickly in few iterations. Comprehensive experimental results on four benchmark datasets show that RMSC can obtain a robust consensus representation and outperforms the state-of-the-art multi-view clustering algorithms.

7 Appendix

RMSC-WV is a variation of model (2) and it considers a unified weight for each view (i.e., samples in the same view will be assigned same weight). The formulation of RMSC-WV defined in Section 5.2 is

(13)

where denotes the weight of -th view and is a regularizer on . is defined in eq (4). Similar to model (2), block coordinate descent strategy is used for its optimization.

References

  • [Boyd and Vandenberghe2004] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [Cao et al.2015] Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. Diversity-induced multi-view subspace clustering. In CVPR, pages 586–594, 2015.
  • [Chen et al.2011] Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y Chang. Parallel spectral clustering in distributed systems. TPAMI, 33(3):568–586, 2011.
  • [Elhamifar and Vidal2013] Ehsan Elhamifar and Rene Vidal. Sparse subspace clustering: Algorithm, theory, and applications. TPAMI, 35(11):2765–2781, 2013.
  • [Feng et al.2014] Jiashi Feng, Zhouchen Lin, Huan Xu, and Shuicheng Yan. Robust subspace segmentation with block-diagonal prior. In CVPR, pages 3818–3825. IEEE, 2014.
  • [Gao et al.2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multi-view subspace clustering. In ICCV, pages 4238–4246, 2015.
  • [Gönen and Margolin2014] Mehmet Gönen and Adam A Margolin. Localized data fusion for kernel k-means clustering with application to cancer biology. In NIPS, pages 1305–1313, 2014.
  • [He et al.2014] Ran He, Tieniu Tan, and Liang Wang. Robust recovery of corrupted low-rankmatrix by implicit regularizers. TPAMI, 36(4):770–783, 2014.
  • [Hu et al.2014] Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. Smooth representation clustering. In CVPR, pages 3834–3841, 2014.
  • [Kumar and Daumé2011] Abhishek Kumar and Hal Daumé. A co-training approach for multi-view spectral clustering. In ICML, pages 393–400, 2011.
  • [Kumar et al.2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral clustering. In NIPS, pages 1413–1421, 2011.
  • [Li et al.2016] Miaomiao Li, Xinwang Liu, Lei Wang, Yong Dou, Jianping Yin, and En Zhu. Multiple kernel clustering with local kernel alignment maximization. In IJCAI, volume 16, 2016.
  • [Liu et al.2013a] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by low-rank representation. TPAMI, 35(1):171–184, 2013.
  • [Liu et al.2013b] Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multi-view clustering via joint nonnegative matrix factorization. In SIAM Data Mining Conference, pages 252–260, 2013.
  • [Liu et al.2016] Xinwang Liu, Yong Dou, Jianping Yin, Lei Wang, and En Zhu. Multiple kernel k-means clustering with matrix-induced regularization. In AAAI, pages 1888–1894, 2016.
  • [Lu et al.2016] Canyi Lu, Shuicheng Yan, and Zhouchen Lin. Convex sparse spectral clustering: Single-view to multi-view. TIP, 25(6):2833–2843, 2016.
  • [Shi and Malik2000] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000.
  • [Sun et al.2015] Jiangwen Sun, Jin Lu, Tingyang Xu, and Jinbo Bi. Multi-view sparse co-clustering via proximal alternating linearized minimization. In ICML, pages 757–766, 2015.
  • [Von Luxburg2007] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
  • [Wang et al.2013a] Hua Wang, Feiping Nie, and Heng Huang. Multi-view clustering and feature learning via structured sparsity. In ICML, pages 352–360, 2013.
  • [Wang et al.2013b] Xinchao Wang, Wei Bian, and Dacheng Tao. Grassmannian regularized structured multi-view embedding for image classification. TIP, 22(7):2646–2660, 2013.
  • [Wang et al.2014] Hongxing Wang, Chaoqun Weng, and Junsong Yuan. Multi-feature spectral clustering with minimax optimization. In CVPR, pages 4106–4113, 2014.
  • [Xia et al.2010] Tian Xia, Dacheng Tao, Tao Mei, and Yongdong Zhang. Multiview spectral embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 40(6):1438–1446, 2010.
  • [Xu et al.2013] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.
  • [Xu et al.2015] Chang Xu, Dacheng Tao, and Chao Xu. Multi-view self-paced learning for clustering. In IJCAI, pages 3974–3980, 2015.
  • [Xu et al.2016] Jinglin Xu, Junwei Han, and Feiping Nie. Discriminatively embedded k-means for multi-view clustering. In CVPR, pages 5356–5364, 2016.
  • [Yin et al.2015] Qiyue Yin, Shu Wu, Ran He, and Liang Wang. Multi-view clustering via pairwise sparse subspace representation. Neurocomputing, 156:12–21, 2015.
  • [Zhang et al.2013] Yingya Zhang, Zhenan Sun, Ran He, and Tieniu Tan. Robust subspace clustering via half-quadratic minimization. In ICCV, pages 3096–3103, 2013.
  • [Zhao and Fu2015] Handong Zhao and Yun Fu.

    Dual-regularized multi-view outlier detection.

    In IJCAI, pages 4077–4083, 2015.