, which aims at exploring the local nonlinear manifold (spectral graph) structure inherently embedded in high-dimensional data to partition data into disjoint meaningful groups, is a fundamental clustering problem. Because of its elegance, efficiency and good performance, spectral clustering has become one of the most popular clustering methods. Recently, great attention have shifted from conventional single view/graph to multi-view spectral clustering, with the motivation of leveraging the complementary information from multi-view data sources where the same data set can be decomposed by different featurese.g., an image can be described by its color histogram or shape feature; one document can be represented by page link or document text. As explicitly claimed by numerous pieces of multi-view research [Xu et al.2015, Xu et al.2013, Wang et al.2015c, Wang et al.2014a], an individual view is unlikely to be sufficiently faithful for effective multi-view learning. Therefore, the integration of multi-view information is both valuable and necessary.
Essentially, the critical issue of multi-view learning is to achieve the agreement/consensus [Gui et al.2014, Wang et al.2015b, Wang et al.2014b, Wang et al.2016] among all views given the complementary information from multi-views to yield a substantial superior performance in clustering over the single view paradigm. Numerous multi-view based methods are proposed for spectral clustering. [Huang et al.2012, Bickel and Scheffer.2004]
performs multi-view information incorporation into the clustering process by optimizing certain objective loss function.Late fusion strategy [Greene and Cunningham2009] designed for multi-view spectral clustering works by first deriving the spectral clustering performance regarding each view, and then combining multiple view-induced results into an optimal one. Such strategy, however, cannot ideally achieve the multi-view agreement, as each view cannot co-regularize with each other during the clustering process.
Canonical Correlation Analysis (CCA) based methods [Blaschko and Lampert.2008, Chaudhuri et al.2009] for multi-view spectral clustering are developed by projecting the multi-view data sources into one common lower dimensional subspace, where the spectral clustering is subsequently conducted. One limitation of such method lies in the fact that one common lower-dimensional subspace cannot flexibly characterize the local spectral graph structures from heterogeneous views, resulting into an inferior multi-view spectral clustering. Kumar et al. [Kumar et al.2011]
proposed a state-of-the-art co-regularized spectral clustering for multi-view data. They attempted to regularize the eigenvectors of view-dependent graph laplacians and achieve consensus clusters across views. Similarly, a co-training[Blum and Mitchell1998, Wang and Zhou2010] framework is proposed for multi-view spectral clustering [Kumar and Daume2011], where the similarity matrix from one view is projected into the subspaces spanned by the eigenvectors from other views, then spectral clustering is conducted on such projected similarity matrix. This process is alternately performed until convergence, and the final result is formed by aggregating the clustering results from each individual view.
The above co-regularized [Kumar et al.2011] and co-training [Kumar and Daume2011] based methods can effectively achieve the clustering consensus under the scenario with noise corruption free in view-dependent feature representations. However, such assumption is hard to be satisfied in practice. To address such stand-out limitation, Low-Rank Representation (LRR) [Xia et al.2014, Liu et al.2010, Liu and Yan2011, Liu et al.2013]
based approaches have been proposed for multi-view spectral clustering. The basic idea is to decompose data representation from any view into a view-dependent noise corruption term and a common low rank based representation shared by all views, which further leads to common data affinity matrix for clustering. The typical LRR[Xia et al.2014, Liu et al.2010] model is formulated below.
where denotes the number of all views; denotes the data feature representation for the view, is the number of data objects for each view; is the feature representation dimensions for the view. represents the self-expressive linear sample correlations [Liu et al.2010] shared by all views with the assumption that the similar samples can be linearly reconstructed by each other. models the possible noise corruptions in the feature representations for the view. is the norm of representing the summation of the absolute value of all entries from ; is the balance parameter.
Despite the effectiveness of LRR for multi-view spectral clustering, they still arguably face the following fundamental limitations:
LRR attempts to learn a common lowest-rank representation revealing a low-dimensional subspace structure, but inattentively ignore the distinct manifold structures in each view, which turns out to be critically important to multi-view spectral clustering.
1.2 Our contributions
To address those stand-out limitations, our method delivers the following novel features:
To characterize the non-linear spectral graph structure from each view, inspired by [Yin et al.2016], we propose to couple LRR with multi-graph regularization, where each graph laplacian regularization can characterize the view-dependent non-linear local data similarity.
To achieve the view agreement while preserving the data correlations within each view, we present an iterative view agreement process in optimizing our objective function. During each iteration, the low-rank representation yielded from each view serves as the constraint to regulate the representation learning from other views. This process iteratively boosts these representations to be more agreeable.
To model the above intuitions, we figure out a novel objective function and the Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) [Lin et al.2011] method is deployed to solve it.
2 Iterative Low-Rank based Structured Optimization Method to Multi-view Spectral Clustering
It is well known that the critical issue for spectral clustering lies in how to effectively model the local nonlinear manifold structure [Zelnik-Manor and Perona2004]. Hence, for each view, we aim at preserving such nonlinear manifold structure of original high-dimensional data set within the space spanned by the low-rank sparse representations for the view. This can be formulated as:
where is the row of representing the linear correlation representation between and in the view; is the entry of the similarity matrix , which encodes the similarity between and from the original high dimensional space for the view; is the similarity matrix for all the data objects from ; is a diagonal matrix with its diagonal entry to be the summation of the row of , and is the graph laplacian matrix for the view; thus Eq.(2) is always dubbed graph laplacian regularizer. In this paper, we choose Gaussian kernel to calculate as
where is the bandwidth parameter and denotes the norm; Eq.(3) holds if is within the nearest neighbors of or vice versa, and it is 0 otherwise. is set to 0 to avoid self-loop. Eq.(2) explicitly requires to well characterize the local manifold structure inherently embedded in original high-dimensional for the view, which is of importance to spectral clustering.
Based on the above, we leverage the above graph laplacian regularizer with the low-rank representation. Considering the global clustering structure captured by low-rank representation may prevent us from directly imposing graph Laplacian regularizer for local manifold structure, we propose to impose the sparsity norm on , denoted as , which can discriminatively extract the local sparse representative neighborhood of each data object.
As explicitly revealed by most of the multi-view clustering research [Kumar et al.2011, Kumar and Daume2011, Bickel and Scheffer.2004], it is always anticipated that a data point should be assigned to the same cluster irrespective of views. In other words, the critical issue to ensure ideal multi-view clustering performance is to achieve the clustering agreement among all views. Based on that, we aim to minimizing the difference of such low-rank and sparse representations from different views by proposing a consensus term to coordinate all views to reach clustering agreement.
denotes the low-rank representation revealing the global clustering structure regarding .
aims at extracting the local sparse representation of each data object in .
characterizes the local manifold structure.
characterizes the agreement among the sparse and low-rank representations from all views.
models the possible Laplacian noise contained by , we pose on for noise robustness.
is a non-negative constraint to ensure that each data object is amid its neighbors, through , so that the data correlations can be well encoded for the view.
are all trade-off parameters
Eq.(4) is a typical low-rank optimization problem, and a lot of methods are available to solve it. Among these methods, the Alternating Direction Method is the typical solution, which aims at updating each variable alternatively by minimizing the augmented lagrangian function in a Gauss-Seidel fashion. In this paper, we deploy the method of Linearized Alternating Direction Method with Adaptive Penalty, dubbed LADMAP [Lin et al.2011]. The underlying philosophy of LADMAP is to linearly represent the smooth component, which enables Lagrange multipliers to be updated within the feasible approximation error.
Observing that solving all the pairs follows the same type of optimization strategy, we only present the optimization strategy for the view. To resolve this, we first introduce an auxiliary variable , then solving the Eq.(4) with respect to and can be written as follows
We then present the augmented lagrangian function of Eq.(5) below
where and are Lagrange multipliers, is the inner product and is a penalty parameter. We update each of the above variables alternatively by minimizing Eq.(6) while with other variables fixed. In what follows, we will provide the details of optimizing Eq.(6) with respect to each variable in next section.
3 Optimization Strategy
Minimizing Eq.(6) w.r.t. is equivalent to minimizing the following
It cannot yield a closed form throughout Eq.(7). Thanks to LADMAP, we can approximately reconstruct the smooth terms of via a linear manner. The smooth terms of are summarized below
where denotes the partial gradient of w.r.t. Z at , and is approximated by the linear representation w.r.t. together with a proximal term . The above replacement is valid provided that , where
denotes the largest eigenvalue of. Then the following closed form holds for Eq.(9) for each update.
represents the Singular Value Threshold (SVT) operation.
is the singular value decomposition of matrix, and is called the soft threshold operator, is 1 if it is positive and 0 otherwise.
3.1.1 Insights for Iterative Views Agreement.
We remark that the intuitions for iterative (during each iteration) views-agreement can be captured by expanding below
We expand the last term , then we re-write Eq.(11) below
where updating is explicitly influenced from other views i.e., , which reveals that such low-rank representations e.g., updating from each view e.g., the view are formed by referring to the other views, while served as a constraint to update other views for each iteration so that the complementary information from all views are intuitively leveraged towards a final agreement for clustering.
3.4 Updating and
We update Lagrange multipliers via
We remark that can be tuned using the adaptive updating strategy as suggested by [Lin et al.2011] to yield a faster convergence. The optimization strategy alternatively update each variable while fixing others until the convergence condition is met.
Thanks to LADMAP [Lin et al.2011]
, the above optimization process converges to a globally optimal solution enjoyed. Besides, we may employ the Lanczos method to compute the largest singular values and vectors by only performs multiplication ofwith vectors, which can be efficiently computed by such successive matrix-vector multiplications.
3.5 Clustering with
Once the converged are learned for each of the views, we normalize all column vectors of while set small entries under given threshold to be 0. After that, we can calculate the similarity matrix for the view between the and data objects. The final data similarity matrix for all views are defined as
The spectral clustering is performed on calculated via Eq.(20) to yield the final multi-view spectral clustering result.
We evaluate our method on the following data sets:
UCI handwritten Digit set111http://archive.ics.uci.edu/ml/datasets/Multiple+Features: It consists of features of hand-written digits (0-9). The dataset is represented by 6 features and contains 2000 samples with 200 in each category. Analogous to [Lin et al.2011], we choose 76 Fourier coefficients (FC) of the character shapes and the 216 profile correlations (PC) as two views.
Animal with Attribute (AwA)222http://attributes.kyb.tuebingen.mpg.de: It consists of 50 kinds of animals described by 6 features (views): Color histogram ( CQ, 2688-dim), local self-similarity (LSS, 2000-dim), pyramid HOG (PHOG, 252-dim), SIFT (2000-dim), Color SIFT (RGSIFT, 2000-dim), and SURF (2000-dim). We randomly sample 80 images for each category and get 4000 images in total.
NUS-WIDE-Object (NUS) [Chua et al.2009]: The data set consists of 30000 images from 31 categories. We construct 5 views using 5 features as provided by the website 333lms.comp.nus.edu.sg/research/NUS-WIDE.html
These data sets are summarized in Table 1.
|1||FC (76)||CQ (2688)||CH(65)|
|2||PC (216)||LSS (2000)||CM(226)|
|# of data||2000||4000||26315|
|# of classes||10||50||31|
We compare our approach with the following state-of-the-art baselines:
MFMSC: Using the concatenation of multiple features to perform spectral clustering.
Multi-view affinity aggregation for multi-view spectral clustering (MAASC) [Huang et al.2012].
Canonical Correlation Analysis (CCA) based multi-view spectral clustering (CCAMSC) [Chaudhuri et al.2009]: Projecting multi-view data into a common subspace, then perform spectral clustering.
Co-regularized multi-view spectral clustering (CoMVSC) [Kumar et al.2011]: It regularizes the eigenvectors of view-dependent graph laplacians and achieve consensus clusters across views.
4.2 Experimental Settings and Parameters Study
For fair comparison, we implement these competitors by following their experimental setting and the parameter tuning steps in their papers. The Gaussian kernel is used throughout experiments on all data sets and in Eq.(3) is learned by self-tuning method [Zelnik-Manor and Perona2004], and
to construct -nearest neighbors for each data object to calculate Eq.(3).
To measure the clustering results, we use two standard metrics: clustering accuracy (ACC) (Ratio for the number of data objects having same clustering label and ground truth label against total data objects), and normalized mutual information (NMI). Pleaser refer to [Chen et al.2011] for details of these two clustering metrics. All experiments are repeated 10 times, and we report their averaged mean value.
Feature noise modeling for robustness: Following [Siyahjani et al.2015]
, for each view-specific feature representation, 20% feature elements are corrupted with uniform distribution over the range [5,-5], which is consistent to the practical setting while matching withRLRR and our method.
4.3 Validating Multi-graph regularization and Iterative views agreement
We test for multi-graph regularization term and for iterative views agreement within the interval [0.001,10] over the AwA data set and adopt such setting for other data sets. Specifically, we test each value of one parameter while fixing the value of the other parameter, the results are then illustrated in Fig. 1.
From both Fig.1 (a) and (b), the following observations can be identified:
when fixing the value of , increasing the value of can basically improves the ACC and NMI value of our method. The similar observation can be identified vice versa; that is, fixing the value of , meanwhile increasing the can always lead to the clustering improvement in terms of both ACC and NMI.
Both the above clustering measures ACC and NMI will unsurprisingly increase when both and increases until reach the optimal pair-combinations, then slightly decrease.
Upon the above observations, we choose a balance pair values: and for our method.
4.4 Experimental Results and Analysis
|ACC (%)||UCI digits||AwA||NUS|
|NMI (%)||UCI digits||AwA||NUS|
Nearly most of the clustering performance in terms of both ACC and NMI are better than RLRR, which further demonstrates the effectiveness of our multi-graph regularization and iterative views agreement scheme combining with LRR scheme for multi-view spectral clustering.
First, comparing with view-fusion methods like MFMSC and MAASC, our method improves the clustering performance by a notable margin on all data sets. Specifically, it highly improves the clustering performance in terms of ACC from 43.81% (MFMSC), 51.74% (MAASC) to 86.39% on UCI digits data set. Such notable improvement can be also observed on AwA and NUS data sets.
Second, CCAMSC that learns a common low-dimensional subspace among multi-view data is less effective in clustering due to its incapability of encoding local graph structures from heterogeneous views within only a common subspace. In contrast, our method can well address such problem with a novel iterative views-agreement scheme, which is notably evidenced in terms of both ACC and NMI.
Comparing with co-regularized paradigms (CoMVSC, and Co-training), our method works more effectively in the presence of noise corruptions. For example, in NUS data set, it improves the clustering accuracy from 33.63%(CoMVSC), 34.25% (Co-training) to 41.02%. Although RLRR is also effective to deal with practical noise-corrupted multi-view data. However, as aforementioned, learning only one common low-rank correlation representation shared by all views is failed to flexibly capture all the local nonlinear manifold structures from all views, which is crucial to spectral clustering, while our technique can deliver a better performance.
In this paper, we propose an iterative structured low-rank optimization method to multi-view spectral clustering. Unlike existing methods, Our method can well encode the local data manifold structure from each view-dependent feature space, and achieve the multi-view agreement via an iterative fashion, while better preserve the flexible nonlinear manifold structure from all views. The superiorities are validated by extensive experiments over real-world multi-view data sets.
One future direction is to adapt the proposed iterative fashion technique to cross-view based research [Wu et al.2013, Wang et al.2015a] by dealing with multiple data source yet corresponding to the same latent semantics. We aim to develop the novel iterative technique to learn the projections for multiple data sources into the common latent space to well characterize the shared latent semantics.
- [Bickel and Scheffer.2004] S. Bickel and T. Scheffer. Multi-view clustering. In IEEE ICDM, 2004.
- [Blaschko and Lampert.2008] M. Blaschko and C. Lampert. Correlational spectral clustering. In CVPR, 2008.
- [Blum and Mitchell1998] Avrim Blum and Tom M. Mitchell. Combining labeled and unlabeled sata with co-training. In COLT, 1998.
- [Cai et al.2008] Jian-Feng Cai, Emmanuel J. Candes, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization., 20(4):1956–1982, 2008.
- [Chaudhuri et al.2009] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via canonical correlation analysis. In ICML, 2009.
- [Chen et al.2011] Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell., 33(3):568–586, 2011.
- [Chua et al.2009] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao Zheng. Nus-wide: A real-world web image database from national university of singapore. In ACM CIVR, 2009.
- [Greene and Cunningham2009] D. Greene and P. Cunningham. A matrix factorization approach for integrating multiple data views. In ECMLPKDD, 2009.
- [Gui et al.2014] Jie Gui, Dacheng Tao, Zhenan Sun, Yong Luo, Xinge You, and Yuan Yan Tang. Group sparse multiview patch alignment framework with view consistency for image classification. IEEE Transactions on Image Processing, 23(7):3126–3137, 2014.
- [Huang et al.2012] Hsin-Chien Huang, Yung-Yu Chuang, and Chu-Song Chen. Affinity aggregation for spectral clustering. In CVPR, 2012.
- [Kumar and Daume2011] Abhishek Kumar and Hal Daume. A co-training approach for multi-view spectral clustering. In ICML, 2011.
- [Kumar et al.2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral clustering. In NIPS, 2011.
- [Lin et al.2011] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS, 2011.
[Liu and Yan2011]
Guangcan Liu and Shuicheng Yan.
Latent low-rank representation for subspace segmentation and feature extraction.In ICCV, 2011.
- [Liu et al.2010] Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmentation by low-rank representation. In ICML, 2010.
- [Liu et al.2013] Guangcan Liu, Zhuochen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):171–184, 2013.
[Ng et al.2001]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss.
On spectral clustering: Analysis and an algorithm.In NIPS, 2001.
- [Siyahjani et al.2015] Farzad Siyahjani, Ranya Almohsen, Sinan Sabri, and Gianfranco Doretto. A supervised low-rank method for learning invariant subspace. In ICCV, 2015.
- [Wang and Zhou2010] Wei Wang and Zhi-Hua Zhou. A new analysis of co-training. In ICML, 2010.
- [Wang et al.2014a] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. Exploiting correlation consensus: towards subspace clustering for multi-modal data. In ACM Multimedia, 2014.
[Wang et al.2014b]
Yang Wang, Jian Pei, Xuemin Lin, and Qing Zhang.
An iterative fusion approach to graph-based semi-supervised learning from multiple views.In PAKDD, 2014.
- [Wang et al.2015a] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. Lbmch: Learning bridging mapping for cross-modal hashing. In ACM SIGIR, 2015.
- [Wang et al.2015b] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. Robust subspace clustering for multl-view data by exploiting correlation consensus. IEEE Trans. Image Processing., 24(11):3939–3949, 2015.
- [Wang et al.2015c] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learning Syst., 2015.
- [Wang et al.2016] Yang Wang, Xuemin Lin, Lin Wu, Qing Zhang, and Wenjie Zhang. Shifting multi-hypergraphs via collaborative probabilistic voting. Knowl. Inf. Syst, 46(3):515–536, 2016.
- [Wu et al.2013] Lin Wu, Yang Wang, and John Shepherd. Efficient image and tag co-ranking: a bregman divergence optimization method. ACM Multimedia, 2013.
- [Xia et al.2014] Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multi-view spectral clustering via low-rank and sparse decomposition. In AAAI, 2014.
- [Xu et al.2013] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. Corr abs/1304.5634, 2013.
- [Xu et al.2015] Chang Xu, Dacheng Tao, and Chao Xu. Multi-view intact space learning. IEEE Trans. Pattern Anal. Mach. Intell, 2015.
- [Yin et al.2016] Ming Yin, Junbin Gao, and Zhouchen Lin. Laplacian regularized low-rank representation and its applications. IEEE Trans. Pattern Anal. Mach. Intell, 38(3):504–517, 2016.
- [Zelnik-Manor and Perona2004] Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. In NIPS, 2004.