1 Introduction
Spectral clustering [Ng et al.2001, ZelnikManor and Perona2004]
, which aims at exploring the local nonlinear manifold (spectral graph) structure inherently embedded in highdimensional data to partition data into disjoint meaningful groups, is a fundamental clustering problem. Because of its elegance, efficiency and good performance, spectral clustering has become one of the most popular clustering methods. Recently, great attention have shifted from conventional single view/graph to multiview spectral clustering, with the motivation of leveraging the complementary information from multiview data sources where the same data set can be decomposed by different features
e.g., an image can be described by its color histogram or shape feature; one document can be represented by page link or document text. As explicitly claimed by numerous pieces of multiview research [Xu et al.2015, Xu et al.2013, Wang et al.2015c, Wang et al.2014a], an individual view is unlikely to be sufficiently faithful for effective multiview learning. Therefore, the integration of multiview information is both valuable and necessary.1.1 Motivation
Essentially, the critical issue of multiview learning is to achieve the agreement/consensus [Gui et al.2014, Wang et al.2015b, Wang et al.2014b, Wang et al.2016] among all views given the complementary information from multiviews to yield a substantial superior performance in clustering over the single view paradigm. Numerous multiview based methods are proposed for spectral clustering. [Huang et al.2012, Bickel and Scheffer.2004]
performs multiview information incorporation into the clustering process by optimizing certain objective loss function.
Late fusion strategy [Greene and Cunningham2009] designed for multiview spectral clustering works by first deriving the spectral clustering performance regarding each view, and then combining multiple viewinduced results into an optimal one. Such strategy, however, cannot ideally achieve the multiview agreement, as each view cannot coregularize with each other during the clustering process.Canonical Correlation Analysis (CCA) based methods [Blaschko and Lampert.2008, Chaudhuri et al.2009] for multiview spectral clustering are developed by projecting the multiview data sources into one common lower dimensional subspace, where the spectral clustering is subsequently conducted. One limitation of such method lies in the fact that one common lowerdimensional subspace cannot flexibly characterize the local spectral graph structures from heterogeneous views, resulting into an inferior multiview spectral clustering. Kumar et al. [Kumar et al.2011]
proposed a stateoftheart coregularized spectral clustering for multiview data. They attempted to regularize the eigenvectors of viewdependent graph laplacians and achieve consensus clusters across views. Similarly, a cotraining
[Blum and Mitchell1998, Wang and Zhou2010] framework is proposed for multiview spectral clustering [Kumar and Daume2011], where the similarity matrix from one view is projected into the subspaces spanned by the eigenvectors from other views, then spectral clustering is conducted on such projected similarity matrix. This process is alternately performed until convergence, and the final result is formed by aggregating the clustering results from each individual view.The above coregularized [Kumar et al.2011] and cotraining [Kumar and Daume2011] based methods can effectively achieve the clustering consensus under the scenario with noise corruption free in viewdependent feature representations. However, such assumption is hard to be satisfied in practice. To address such standout limitation, LowRank Representation (LRR) [Xia et al.2014, Liu et al.2010, Liu and Yan2011, Liu et al.2013]
based approaches have been proposed for multiview spectral clustering. The basic idea is to decompose data representation from any view into a viewdependent noise corruption term and a common low rank based representation shared by all views, which further leads to common data affinity matrix for clustering. The typical LRR
[Xia et al.2014, Liu et al.2010] model is formulated below.(1)  
where denotes the number of all views; denotes the data feature representation for the view, is the number of data objects for each view; is the feature representation dimensions for the view. represents the selfexpressive linear sample correlations [Liu et al.2010] shared by all views with the assumption that the similar samples can be linearly reconstructed by each other. models the possible noise corruptions in the feature representations for the view. is the norm of representing the summation of the absolute value of all entries from ; is the balance parameter.
Despite the effectiveness of LRR for multiview spectral clustering, they still arguably face the following fundamental limitations:

LRR attempts to learn a common lowestrank representation revealing a lowdimensional subspace structure, but inattentively ignore the distinct manifold structures in each view, which turns out to be critically important to multiview spectral clustering.

Lowrank constraint is imposed to enforce all views to share the consensus in Eq.(1) [Xia et al.2014], however, such enforced common representation may not flexibly preserve the local manifold structure from heterogeneous views, resulting into a nonideal multiview clustering performance.
1.2 Our contributions
To address those standout limitations, our method delivers the following novel features:

To characterize the nonlinear spectral graph structure from each view, inspired by [Yin et al.2016], we propose to couple LRR with multigraph regularization, where each graph laplacian regularization can characterize the viewdependent nonlinear local data similarity.

To achieve the view agreement while preserving the data correlations within each view, we present an iterative view agreement process in optimizing our objective function. During each iteration, the lowrank representation yielded from each view serves as the constraint to regulate the representation learning from other views. This process iteratively boosts these representations to be more agreeable.

To model the above intuitions, we figure out a novel objective function and the Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) [Lin et al.2011] method is deployed to solve it.
2 Iterative LowRank based Structured Optimization Method to Multiview Spectral Clustering
It is well known that the critical issue for spectral clustering lies in how to effectively model the local nonlinear manifold structure [ZelnikManor and Perona2004]. Hence, for each view, we aim at preserving such nonlinear manifold structure of original highdimensional data set within the space spanned by the lowrank sparse representations for the view. This can be formulated as:
(2)  
where is the row of representing the linear correlation representation between and in the view; is the entry of the similarity matrix , which encodes the similarity between and from the original high dimensional space for the view; is the similarity matrix for all the data objects from ; is a diagonal matrix with its diagonal entry to be the summation of the row of , and is the graph laplacian matrix for the view; thus Eq.(2) is always dubbed graph laplacian regularizer. In this paper, we choose Gaussian kernel to calculate as
(3) 
where is the bandwidth parameter and denotes the norm; Eq.(3) holds if is within the nearest neighbors of or vice versa, and it is 0 otherwise. is set to 0 to avoid selfloop. Eq.(2) explicitly requires to well characterize the local manifold structure inherently embedded in original highdimensional for the view, which is of importance to spectral clustering.
Based on the above, we leverage the above graph laplacian regularizer with the lowrank representation. Considering the global clustering structure captured by lowrank representation may prevent us from directly imposing graph Laplacian regularizer for local manifold structure, we propose to impose the sparsity norm on , denoted as , which can discriminatively extract the local sparse representative neighborhood of each data object.
As explicitly revealed by most of the multiview clustering research [Kumar et al.2011, Kumar and Daume2011, Bickel and Scheffer.2004], it is always anticipated that a data point should be assigned to the same cluster irrespective of views. In other words, the critical issue to ensure ideal multiview clustering performance is to achieve the clustering agreement among all views. Based on that, we aim to minimizing the difference of such lowrank and sparse representations from different views by proposing a consensus term to coordinate all views to reach clustering agreement.
(4)  
where

denotes the lowrank representation revealing the global clustering structure regarding .

aims at extracting the local sparse representation of each data object in .

characterizes the local manifold structure.

characterizes the agreement among the sparse and lowrank representations from all views.

models the possible Laplacian noise contained by , we pose on for noise robustness.

is a nonnegative constraint to ensure that each data object is amid its neighbors, through , so that the data correlations can be well encoded for the view.

are all tradeoff parameters
Eq.(4) is a typical lowrank optimization problem, and a lot of methods are available to solve it. Among these methods, the Alternating Direction Method is the typical solution, which aims at updating each variable alternatively by minimizing the augmented lagrangian function in a GaussSeidel fashion. In this paper, we deploy the method of Linearized Alternating Direction Method with Adaptive Penalty, dubbed LADMAP [Lin et al.2011]. The underlying philosophy of LADMAP is to linearly represent the smooth component, which enables Lagrange multipliers to be updated within the feasible approximation error.
Observing that solving all the pairs follows the same type of optimization strategy, we only present the optimization strategy for the view. To resolve this, we first introduce an auxiliary variable , then solving the Eq.(4) with respect to and can be written as follows
(5)  
We then present the augmented lagrangian function of Eq.(5) below
(6)  
where and are Lagrange multipliers, is the inner product and is a penalty parameter. We update each of the above variables alternatively by minimizing Eq.(6) while with other variables fixed. In what follows, we will provide the details of optimizing Eq.(6) with respect to each variable in next section.
3 Optimization Strategy
3.1 Updating
Minimizing Eq.(6) w.r.t. is equivalent to minimizing the following
(7)  
It cannot yield a closed form throughout Eq.(7). Thanks to LADMAP, we can approximately reconstruct the smooth terms of via a linear manner. The smooth terms of are summarized below
(8)  
Based on Eq.(8), we convert the problem of minimizing Eq.(7) to minimize Eq.(9) below
(9) 
where denotes the partial gradient of w.r.t. Z at , and is approximated by the linear representation w.r.t. together with a proximal term . The above replacement is valid provided that , where
denotes the largest eigenvalue of
. Then the following closed form holds for Eq.(9) for each update.(10) 
where
represents the Singular Value Threshold (SVT) operation.
is the singular value decomposition of matrix
, and is called the soft threshold operator, is 1 if it is positive and 0 otherwise.3.1.1 Insights for Iterative Views Agreement.
We remark that the intuitions for iterative (during each iteration) viewsagreement can be captured by expanding below
(11)  
We expand the last term , then we rewrite Eq.(11) below
(12) 
where
Substituting Eq.(12) into Eq.(10) yields Eq.(13)
(13)  
where updating is explicitly influenced from other views i.e., , which reveals that such lowrank representations e.g., updating from each view e.g., the view are formed by referring to the other views, while served as a constraint to update other views for each iteration so that the complementary information from all views are intuitively leveraged towards a final agreement for clustering.
3.2 Updating
Minimizing Eq.(6) w.r.t. is equivalent to solving the following optimization problem
(14) 
where the following closed form solution is hold for according to [Cai et al.2008]
(15) 
3.3 Updating
Minimizing Eq.(6) w.r.t. is equivalent to solving the following optimization problem
(16) 
where the following closed form solution holds for according to [Cai et al.2008]
(17) 
3.4 Updating and
We update Lagrange multipliers via
(18) 
and via
(19) 
We remark that can be tuned using the adaptive updating strategy as suggested by [Lin et al.2011] to yield a faster convergence. The optimization strategy alternatively update each variable while fixing others until the convergence condition is met.
Thanks to LADMAP [Lin et al.2011]
, the above optimization process converges to a globally optimal solution enjoyed. Besides, we may employ the Lanczos method to compute the largest singular values and vectors by only performs multiplication of
with vectors, which can be efficiently computed by such successive matrixvector multiplications.3.5 Clustering with
Once the converged are learned for each of the views, we normalize all column vectors of while set small entries under given threshold to be 0. After that, we can calculate the similarity matrix for the view between the and data objects. The final data similarity matrix for all views are defined as
(20) 
The spectral clustering is performed on calculated via Eq.(20) to yield the final multiview spectral clustering result.
4 Experiments
We evaluate our method on the following data sets:

UCI handwritten Digit set^{1}^{1}1http://archive.ics.uci.edu/ml/datasets/Multiple+Features: It consists of features of handwritten digits (09). The dataset is represented by 6 features and contains 2000 samples with 200 in each category. Analogous to [Lin et al.2011], we choose 76 Fourier coefficients (FC) of the character shapes and the 216 profile correlations (PC) as two views.

Animal with Attribute (AwA)^{2}^{2}2http://attributes.kyb.tuebingen.mpg.de: It consists of 50 kinds of animals described by 6 features (views): Color histogram ( CQ, 2688dim), local selfsimilarity (LSS, 2000dim), pyramid HOG (PHOG, 252dim), SIFT (2000dim), Color SIFT (RGSIFT, 2000dim), and SURF (2000dim). We randomly sample 80 images for each category and get 4000 images in total.

NUSWIDEObject (NUS) [Chua et al.2009]: The data set consists of 30000 images from 31 categories. We construct 5 views using 5 features as provided by the website ^{3}^{3}3lms.comp.nus.edu.sg/research/NUSWIDE.html
: 65dimensional color histogram (CH), 226dimensional color moments (CM), 145dimensional color correlation (CORR), 74dimensional edge estimation (EDH), and 129dimensional wavelet texture (WT).
These data sets are summarized in Table 1.
Features  UCI  AwA  NUS 

1  FC (76)  CQ (2688)  CH(65) 
2  PC (216)  LSS (2000)  CM(226) 
3    PHOG (252)  CORR(145) 
4    SIFT(2000)  EDH(74) 
5    RGSIFT(2000)  WT(129) 
6    SURF(2000)   
# of data  2000  4000  26315 
# of classes  10  50  31 
4.1 Baselines
We compare our approach with the following stateoftheart baselines:

MFMSC: Using the concatenation of multiple features to perform spectral clustering.

Multiview affinity aggregation for multiview spectral clustering (MAASC) [Huang et al.2012].

Canonical Correlation Analysis (CCA) based multiview spectral clustering (CCAMSC) [Chaudhuri et al.2009]: Projecting multiview data into a common subspace, then perform spectral clustering.

Coregularized multiview spectral clustering (CoMVSC) [Kumar et al.2011]: It regularizes the eigenvectors of viewdependent graph laplacians and achieve consensus clusters across views.

Cotraining [Kumar and Daume2011]
: Alternately modify one view’s eigenspace of graph laplacian by referring to the other views’ graph laplacian and their corresponding eigenvectors, upon which, the spectral clustering is conducted. Such process is performed until convergence.

Robust LowRank Representation method (RLRR) [Xia et al.2014], as formulated in Eq.(1).
4.2 Experimental Settings and Parameters Study
For fair comparison, we implement these competitors by following their experimental setting and the parameter tuning steps in their papers. The Gaussian kernel is used throughout experiments on all data sets and in Eq.(3) is learned by selftuning method [ZelnikManor and Perona2004], and
to construct nearest neighbors for each data object to calculate Eq.(3).
To measure the clustering results, we use two standard metrics: clustering accuracy (ACC) (Ratio for the number of data objects having same clustering label and ground truth label against total data objects), and normalized mutual information (NMI). Pleaser refer to [Chen et al.2011] for details of these two clustering metrics. All experiments are repeated 10 times, and we report their averaged mean value.
Feature noise modeling for robustness: Following [Siyahjani et al.2015]
, for each viewspecific feature representation, 20% feature elements are corrupted with uniform distribution over the range [5,5], which is consistent to the practical setting while matching with
RLRR and our method.4.3 Validating Multigraph regularization and Iterative views agreement
We test for multigraph regularization term and for iterative views agreement within the interval [0.001,10] over the AwA data set and adopt such setting for other data sets. Specifically, we test each value of one parameter while fixing the value of the other parameter, the results are then illustrated in Fig. 1.
From both Fig.1 (a) and (b), the following observations can be identified:

when fixing the value of , increasing the value of can basically improves the ACC and NMI value of our method. The similar observation can be identified vice versa; that is, fixing the value of , meanwhile increasing the can always lead to the clustering improvement in terms of both ACC and NMI.

Both the above clustering measures ACC and NMI will unsurprisingly increase when both and increases until reach the optimal paircombinations, then slightly decrease.
Upon the above observations, we choose a balance pair values: and for our method.
4.4 Experimental Results and Analysis
ACC (%)  UCI digits  AwA  NUS 

MFMSC  43.81  17.13  22.81 
MAASC  51.74  19.44  25.13 
CCAMSC  73.24  24.04  27.56 
CoMVSC  80.27  29.93  33.63 
Cotraining  79.22  29.06  34.25 
RLRR  83.67  31.49  35.27 
Ours  86.39  37.22  41.02 
NMI (%)  UCI digits  AwA  NUS 

MFMSC  41.57  11.48  12.21 
MAASC  47.85  12.93  11.86 
CCAMSC  56.51  15.62  14.56 
CoMVSC  63.82  17.30  7.07 
Cotraining  62.07  18.05  8.10 
RLRR  81.20  25.57  18.29 
Ours  85.45  31.74  20.61 
We report the compared clustering results in terms of ACC and NMI in Table 2 and Table 3, upon which, the following observations can be drawn:

Nearly most of the clustering performance in terms of both ACC and NMI are better than RLRR, which further demonstrates the effectiveness of our multigraph regularization and iterative views agreement scheme combining with LRR scheme for multiview spectral clustering.

First, comparing with viewfusion methods like MFMSC and MAASC, our method improves the clustering performance by a notable margin on all data sets. Specifically, it highly improves the clustering performance in terms of ACC from 43.81% (MFMSC), 51.74% (MAASC) to 86.39% on UCI digits data set. Such notable improvement can be also observed on AwA and NUS data sets.

Second, CCAMSC that learns a common lowdimensional subspace among multiview data is less effective in clustering due to its incapability of encoding local graph structures from heterogeneous views within only a common subspace. In contrast, our method can well address such problem with a novel iterative viewsagreement scheme, which is notably evidenced in terms of both ACC and NMI.

Comparing with coregularized paradigms (CoMVSC, and Cotraining), our method works more effectively in the presence of noise corruptions. For example, in NUS data set, it improves the clustering accuracy from 33.63%(CoMVSC), 34.25% (Cotraining) to 41.02%. Although RLRR is also effective to deal with practical noisecorrupted multiview data. However, as aforementioned, learning only one common lowrank correlation representation shared by all views is failed to flexibly capture all the local nonlinear manifold structures from all views, which is crucial to spectral clustering, while our technique can deliver a better performance.
(a) 
(b) 
5 Conclusions
In this paper, we propose an iterative structured lowrank optimization method to multiview spectral clustering. Unlike existing methods, Our method can well encode the local data manifold structure from each viewdependent feature space, and achieve the multiview agreement via an iterative fashion, while better preserve the flexible nonlinear manifold structure from all views. The superiorities are validated by extensive experiments over realworld multiview data sets.
One future direction is to adapt the proposed iterative fashion technique to crossview based research [Wu et al.2013, Wang et al.2015a] by dealing with multiple data source yet corresponding to the same latent semantics. We aim to develop the novel iterative technique to learn the projections for multiple data sources into the common latent space to well characterize the shared latent semantics.
References
 [Bickel and Scheffer.2004] S. Bickel and T. Scheffer. Multiview clustering. In IEEE ICDM, 2004.
 [Blaschko and Lampert.2008] M. Blaschko and C. Lampert. Correlational spectral clustering. In CVPR, 2008.
 [Blum and Mitchell1998] Avrim Blum and Tom M. Mitchell. Combining labeled and unlabeled sata with cotraining. In COLT, 1998.
 [Cai et al.2008] JianFeng Cai, Emmanuel J. Candes, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization., 20(4):1956–1982, 2008.
 [Chaudhuri et al.2009] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan. Multiview clustering via canonical correlation analysis. In ICML, 2009.
 [Chen et al.2011] WenYen Chen, Yangqiu Song, Hongjie Bai, ChihJen Lin, and Edward Y. Chang. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell., 33(3):568–586, 2011.
 [Chua et al.2009] TatSeng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and YanTao Zheng. Nuswide: A realworld web image database from national university of singapore. In ACM CIVR, 2009.
 [Greene and Cunningham2009] D. Greene and P. Cunningham. A matrix factorization approach for integrating multiple data views. In ECMLPKDD, 2009.
 [Gui et al.2014] Jie Gui, Dacheng Tao, Zhenan Sun, Yong Luo, Xinge You, and Yuan Yan Tang. Group sparse multiview patch alignment framework with view consistency for image classification. IEEE Transactions on Image Processing, 23(7):3126–3137, 2014.
 [Huang et al.2012] HsinChien Huang, YungYu Chuang, and ChuSong Chen. Affinity aggregation for spectral clustering. In CVPR, 2012.
 [Kumar and Daume2011] Abhishek Kumar and Hal Daume. A cotraining approach for multiview spectral clustering. In ICML, 2011.
 [Kumar et al.2011] Abhishek Kumar, Piyush Rai, and Hal Daume. Coregularized multiview spectral clustering. In NIPS, 2011.
 [Lin et al.2011] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for lowrank representation. In NIPS, 2011.

[Liu and Yan2011]
Guangcan Liu and Shuicheng Yan.
Latent lowrank representation for subspace segmentation and feature extraction.
In ICCV, 2011.  [Liu et al.2010] Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmentation by lowrank representation. In ICML, 2010.
 [Liu et al.2013] Guangcan Liu, Zhuochen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by lowrank representation. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):171–184, 2013.

[Ng et al.2001]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss.
On spectral clustering: Analysis and an algorithm.
In NIPS, 2001.  [Siyahjani et al.2015] Farzad Siyahjani, Ranya Almohsen, Sinan Sabri, and Gianfranco Doretto. A supervised lowrank method for learning invariant subspace. In ICCV, 2015.
 [Wang and Zhou2010] Wei Wang and ZhiHua Zhou. A new analysis of cotraining. In ICML, 2010.
 [Wang et al.2014a] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. Exploiting correlation consensus: towards subspace clustering for multimodal data. In ACM Multimedia, 2014.

[Wang et al.2014b]
Yang Wang, Jian Pei, Xuemin Lin, and Qing Zhang.
An iterative fusion approach to graphbased semisupervised learning from multiple views.
In PAKDD, 2014.  [Wang et al.2015a] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. Lbmch: Learning bridging mapping for crossmodal hashing. In ACM SIGIR, 2015.
 [Wang et al.2015b] Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. Robust subspace clustering for multlview data by exploiting correlation consensus. IEEE Trans. Image Processing., 24(11):3939–3949, 2015.
 [Wang et al.2015c] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. Unsupervised metric fusion over multiview data by graph random walkbased crossview diffusion. IEEE Trans. Neural Netw. Learning Syst., 2015.
 [Wang et al.2016] Yang Wang, Xuemin Lin, Lin Wu, Qing Zhang, and Wenjie Zhang. Shifting multihypergraphs via collaborative probabilistic voting. Knowl. Inf. Syst, 46(3):515–536, 2016.
 [Wu et al.2013] Lin Wu, Yang Wang, and John Shepherd. Efficient image and tag coranking: a bregman divergence optimization method. ACM Multimedia, 2013.
 [Xia et al.2014] Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multiview spectral clustering via lowrank and sparse decomposition. In AAAI, 2014.
 [Xu et al.2013] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multiview learning. Corr abs/1304.5634, 2013.
 [Xu et al.2015] Chang Xu, Dacheng Tao, and Chao Xu. Multiview intact space learning. IEEE Trans. Pattern Anal. Mach. Intell, 2015.
 [Yin et al.2016] Ming Yin, Junbin Gao, and Zhouchen Lin. Laplacian regularized lowrank representation and its applications. IEEE Trans. Pattern Anal. Mach. Intell, 38(3):504–517, 2016.
 [ZelnikManor and Perona2004] Lihi ZelnikManor and Pietro Perona. Selftuning spectral clustering. In NIPS, 2004.
Comments
There are no comments yet.