I Introduction
Multiview data, obtained from different measurements or collected from various fields to describe objects comprehensively, are widespread in many realworld applications[1],[2],[3]
. For example, in computer vision tasks, an image can be described by multiple view representations (GIST
[4], SIFT[5], LBP[6], etc.); the words presented on a webpage and the words presented in URL are two distinct views of the webpage; video signals and audio signals are two common representations and can be applied for multimedia content understanding. Compared with singleview data, multiview data contain both the consensus and complementary information among multiple views. And the goal of multiview learning, which has achieved success in many applications[1],[7],[8],[9],[10],[11], is to improve the generalization performance by leveraging multiple views.As a fundamental task in unsupervised learning
[12], clustering can be used as a standalone exploratory tool to mine the intrinsic structure of data or a preprocessing stage to assist other learning tasks as well. Many clustering approaches have been proposed, and subspace clustering, which assumes that high dimensional data lie in a union lowdimensional subspaces and tries to group data points into clusters and find the corresponding subspace simultaneously, attracts lots of researches owing to its promising performance and good interpretability
[13]. Variant clustering algorithms based on the subspace clustering have been proposed within different constraints. LowRank Subspace Clustering (LRSC)[14] finds a lowrank linear representation of data in a dictionary of themselves and then employs the spectral clustering on an adjacent matrix, which is derived from the lowrank representation[15], to obtain clustering results. Besides, Sparse Subspace Clustering (SSC)[16], which tries to find a sparsest representation based on the norm, is a powerful subspace clustering algorithms as well. LowRank Sparse Subspace Clustering (LRSSC)[17] applies lowrank and sparse constraints simultaneously based on the trace norm and norm according to the fact that the coefficient matrix is often sparse and lowrank at the same time. Additionally, Smooth Representation Clustering[18] explores the grouping effect for subspace clustering. Although these algorithms are effective in practice, they are designed for singleview data rather than multiview data.Based on the subspace clustering, many multiview subspace clustering approaches have been proposed[19],[20],[21],[22],[23], most of which process multiple views separately and then find a common shared coefficient matrix or fuse clustering results of different views to maximum the consensus among all views. Although good performance has been achieved in practice, the consensus information and complementary of multiview data cannot be explored effectively, since these methods insufficiently describe data within each view separately. In order to make full use of multiple views, we propose a novel multiview subspace clustering dubbed Feature Concatenation Multiview Subspace Clustering (FCMSC) in this paper. Concatenating features of all views and then running the singleview clustering algorithm directly is a native solution for multiview clustering, however, it is ineffective in most realworld applications and even gets worse clustering results[1],[8],[24],[25],[26]. It is noteworthy that our proposed FCMSC can achieve compromising clustering performance conducted on the concatenated features straightforward by exploring both the consensus and complementary information. Specifically, by reducing the samplespecific error among data points and clusterspecific error among multiple views simultaneously, out method decomposes the coefficient matrix, which is calculated based on the joint view representation and enjoys the lowrank property, into a new lowrank coefficient matrix without corrupted information and a clusterspecific noise matrix. Moreover, an optimization algorithm based on the Augmented Lagrangian Multipliers (ALM)[27] is proposed for the objective function of the FCMSC. Extensive experiments on six realworld datasets compared with several stateoftheart multiview clustering approaches show the effectiveness and competiveness of our proposed method.
In summary, the main contributions of this paper can be delivered as follows:

We propose an effective multiview subspace clustering framework that can achieve promising clustering results based on the concatenated features straightforward. To the best of our knowledge, our approach is the first one that can directly process the joint view representation with competitive clustering results.

Both samplespecific error and clusterspecific error are taken into account by introducing reasonable constraints and regularizers. Different from most of existing approaches, the proposed approach can explore both the consensus and the complementary information among multiple views.

Experiments are conducted on public available datasets so as to demonstrate the super performance of the proposed approach for multiview clustering.
The rest of this paper is organized as follows. The next section reviews related works briefly. Section III introduces our approach and Section IV displays the related optimization in detail. Comprehensive experimental results and discussions are provided in Section V. Finally, we will give the conclusions and future works in Section VI.
Ii related works
Recently, a lot of approaches have been proposed to solve the multiview clustering problem[7],[19],[20],[21],[22],[23],[24],[25],[26],[28],[29],[30],[31],[32],[33],[34],[35]. Existing methods can be grouped into two main categories roughly: generative methods and discriminative methods[29]. Generative methods try to construct generative models for different clusters respectively. For example, multiview convex mixture models[31] learn different weights for multiple views automatically and build convex mixture models for multiple views. Although most generative algorithms are robust to the missing entry and even have global optimizations, they are accompanied with a series of hypotheses and parameters, which make the optimization more difficult and time consuming. Discriminative methods, which try to minimize the similarities of data points between clusters and maximize the similarities of data points within clusters directly through all multiple views, have achieve good clustering results in many applications and attracts most attention of research in this field. Taking example of multiview subspace clustering, Latent Multiview Subspace Clustering (LMSC)[19]
, which achieves the current stateoftheart multiview clustering performance, explores underlying complementary information and seeks the latent representation simultaneously; Lowrank Tensor constrained Multiview Subspace Clustering (LTMSC)
[20]formulates the clustering problem as a tensor nuclear norm minimization problem by regarding the subspace representation matrices of multiple views as a tensor; Diversityinduced Multiview Subspace Clustering (DiMSC)
[21] enhances the multiview clustering performance by utilizing the Hilbert Schmidt Independence Criterion to explore the complementary information of different views; Multiview subspace clustering by learning a joint affinity graph[28] leverages a lowrank representation with diversity regularization and a rank constraint to learn a joint affinity graph for clustering, and the MultiView Subspace Clustering (MVSC)[22] uses a shared common cluster structure of all views to obtain clustering results by exploring the consensus information among views; Iterative Views Agreement[23] is a multiview subspace clustering approach, which can preserve the local manifold structures of each view during multiview clustering process. Besides, many spectral clustering based methods are proposed as well. The cotraining approach for multiview spectral clustering[24] and the coregularized multiview spectral clustering[25] process multiple views separately and try to get clustering results that can maximize the agreement among views; Robust Multiview Spectral Clustering (RMSC)[26]recovers a common transition probability matrix via lowrank and sparse decomposition and implies Markov chain method to obtain clustering results. In addition, several multiview clustering methods based on the matrix factorization approach
[36] are proposed by exploring the consensus information among views[34],[35].Among variant discriminative multiview clustering methods, the essential difference is the style they use to explore the consensus information and the complementary information of multiple views. And most of existing multiview clustering methods process multiple views separately. It is natural to combine multiple views before clustering operation, and some related approaches have been proposed[37],[38],[39],[40]. Multiview clustering via Canonical Correlation Analysis (CCA)[37] gets the combination after projection; Methods proposed in[38],[39],[40] use a kernel to combine multiple views. However, these methods may corrupt either the consensus information or the complementary information among views during combination to varying degrees. Since it is difficult to explore effective information among views based on the direct combination of multiple views, multiview clustering results by applying singleview clustering algorithm to the joint view representation are uncompetitive[20],[21],[25],[26],[29], and few works focus on this kind of combination styles. However, original information contained among multiple views can get maximum preservation by concatenating features of all views directly. It is notable that the proposed FCMSC is the first method that can get promising clustering performance by utilizing the concatenated features of multiple views straightforward.
Iii feature concatenation multiview subspace clustering
Symbol  Meaning 

The number of samples.  
The number of views.  
The number of clusters.  
The dimension of features in th view.  
The dimension of the concatenated features, .  
The features of th sample from th view.  
The joint view representation matrix.  
The coefficient matrix of .  
The samplespecific error.  
The coefficient matrix of .  
The clusterspecific error.  
The Laplacian matrix of th view.  
The norm of matrix .  
The tracenorm of matrix .  
The norm of matrix .  
The transpose of matrix .  
rank()  The rank function. 
abs()  The absolute function. 
For convenience, Table I lists main symbols used throughout this paper. Given a multiview dataset with views and samples, i.e. , data points of which are drawn from multiple subspaces. In order to obtain a matrix that each column has the same magnitude, data of each view are normalized within the range of , and then multiple views are concatenated into a joint view representation matrix X, which is defined as follows,
(1) 
where denotes the features of th sample from th view, and th column of contains features of all views of th sample. Based on concatenated features, Fig. 1 displays the whole framework of the proposed FCMSC.
Since statistic properties of different views are diverse and even may be incompatible among views, it is difficult to fully explore the mutual information of multiple views. In order to preliminary explore the mutual information, we consider the following objective function in the beginning:
(2) 
where indicates a coefficient matrix of and denotes the samplespecific error of data points, is a tradeoff parameter and the norm of enforces columns of to be zero[15]. This is a standard lowrank representation of the concatenated features. However, experimental results will shows that the multiview clustering performance will not be competitive if we directly applying the spectral clustering algorithm to . This is because each view has specific statistical properties, which may be contradictory among views, and it is unreasonable to explore the joint views representation by directly applying singleview clustering algorithm.
Usually, the coefficient matrix obtained above is accompanied with the clusterspecific error. Besides, it is far from good enough for clustering. Motivated by the subspace representation with a special dictionary matrix, we introduce the following formulation:
(3) 
where indicates the dictionary matrix, denotes the coefficient matrix which contains the consensus and complementary information without clusterspecific error, and represents the clusterspecific error among multiple views. Obviously, the choice of is vital for the performance of clustering.
Since matrix is free of the samplespecific error, it is reasonable to view the reconstructed features obtained from the lowrank representation (2), i.e. , as the dictionary matrix. Besides, it is difficult to directly process the clusterspecific error caused by multiple views. Therefore, we can decompose it as and reformulate Eq. (3) as follows:
(4) 
For simplicity, we can reformulate the above equation as follows:
(5) 
where denotes the clusterspecific error as well.
It is straightforward to design the following objective function:
(6) 
where and are tradeoff parameters. Although and are both imposed by the norm constraint, they are totally different in essence. More specifically, denotes the samplespecific error, and is employed to decrease the clusterspecific error caused by multipleviews. Theoretically, both the coefficient matrix and are suitable for clustering. To view their difference in a more intuitive way, Fig.2 displays the visualization of and calculated from Yale Face dateset^{1}^{1}1The Yale Face database contains 165 grayscale images in GIF format of 15 individuals. More details will be presented in the section of experiment.. As shown in Fig.2, the underlying structure of is more suitable than that of for clustering.
Additionally, to preserve the local nonlinear manifold of multiple views, a graph Laplacian regularizer should be considered in our method:
(7) 
where denotes the transpose of , represents the graph Laplacian matrix of th view, , and is the degree matrix of th view, is the adjacency matrix of th view[41]. Moreover, the item denoted as the sparse constraint of , is also introduced in our method, so as to extract the local manifold of data and preserve the relationship of the local sparse representative neighborhood of each sample discriminatively.
Finally, the objective function of FCMSC can be formulated as follows:
(8) 
where , , , and are tradeoff parameters. Once the coefficient matrix is learned, we construct the adjacency matrix for spectral clustering as follows:
(9) 
where abs() denotes the absolution function, which can deal with a matrix and return the absolute value of each element in the matrix.
To understand the proposed FCMSC intuitively, Fig. 3 demonstrate difference between the lowrank representation and this approach. As shown in Fig. 3, both the consensus and the complementary information are explored by the proposed FCMSC, which can achieve multiview clustering with good performance.
Iv Optimization
In this section, we will present the optimization of the objective function in detail, and then analysis the computational complexity and convergence.
Iva Algorithm of the FCMSC
We apply the Alternating Direction Minimization strategy based on the Augmented Lagrangian Multiplier (ALM)[27] to solve the objective function. Moreover, two auxiliary variables, and , are introduced, so that the objective function can be separable and reformulated equivalently as follows.
(10) 
The corresponding ALM problem is:
(11) 
where , , and are Laplacian multipliers, is a positive adaptive penalty parameter, denotes the trace of .
The Alternating Direction Minimization strategy is employed to the minimization of ALM problem, thus, the whole problem is decomposed into several subproblems, which can be optimized effectively.
IvA1 Updating
To update with other variable fixed, we solve the following problem:
(12) 
which can be optimized in a closed form.
Specifically, the solution of the above subproblem is supposed as , that we can get the following formulation:
(13) 
where denotes the th column of the matrix , and .
IvA2 Updating
The subproblem of updating , in which other variables are all fixed, can be written as follows:
(14) 
The optimization is similar to the subproblem of updating .
IvA3 Updating
With other variables being fixed, we solve the following problem to update variable :
(15) 
which can be optimized by applying the singular value threshold approach
[27]. To be specific, by settingand performing singular value decomposition on
, i.e. , we can obtain the optimization of :(16) 
where denotes a softthresholding operator as following and can be extended to matrices by applying it elementwise.
(17) 
IvA4 Updating
The auxiliary variable can be updated by solving the following problem:
(18) 
and the optimization of which can be written as follows:
(19) 
IvA5 Updating
When other variables are fixed, the subproblem with respect to can be written as
(20) 
In order to get the optimization, we take the derivative of above function with respect to and then let the derivative to be 0. The optimization of is
(21) 
where
is an identity matrix with proper size. It is noteworthy that the local structure of multiple views are taken into consideration when updating
.IvA6 Updating
With other variables being fixed, the subproblem of updating can be written as follows:
(22) 
Accordingly, taking derivative with respect to Z and then let the derivate to be 0, we obtain the following equivalent equation, solution of which is the optimization of this subproblem:
(23) 
where , , and can be written as follows:
(24) 
The above equation is a Sylvester equation and can be solved referring to[42].
IvA7 Updating Lagrange multipliers and
According to[27], we update Lagrange multipliers and μ as following:
(25) 
where and the parameter is monotonically increased by until reaching the maximum, .
Algorithm 1 outlines the whole procedure of optimization. It is should be noticed that we random initialize in practice to avoid all zeros solutions.
IvB Computational Complexity and Convergence
As shown in Algorithm 1, the main computational burden is composed of six parts, i.e. the six subproblems. The complexity of updating is , and the complexity of updating is , both of which are matrix multiplication. As for the subproblem of updating , the complexity is . The complexity of updating is . In the subproblem of updating , the complexity is , since matrix inversion is included during optimization process. For updating , Sylvester equation is optimized, and the complexity of this subproblem is . To sum up, the computational complexity of each iteration is , and in practice, is much larger that , and the complexity is .
For convergence analysis, unfortunately, we find that it is difficulty to give any solid proof on the convergence of the proposed algorithm. However, extensive experimental results on the realworld datasets show that our algorithm can converge effectively with allzero initialization except for variable , which is initialized with random information.
V Experiments
In this section, comprehensive experiments are conducted on several real datasets. Accordingly, experimental results are presented with corresponding analyses. All codes were implemented in Matlab on a desktop with fourcore 3.6GHz processor and 8GB of memory.
Va Experimental Settings
To evaluate the performance of proposed approach, we employ six realworld datasets in experiments, including Mfeatures^{1}^{1}1http://archive.ics.uci.edu/ml/datasets/Multiple+Features, Movies 617^{2}^{2}2http://ligmembres.imag.fr/grimal/data/movies617.tar.gz, MSRCV1^{3}^{3}3http://research.microsoft.com/enus/projects/objectclassrecognition/, Olympics^{4}^{4}4http://mlg.ucd.ie/aggregation/, ORL^{5}^{5}5https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html, and Yale Face^{6}^{6}6http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html. Specifically, Table II lists the number of samples, cluster, as well as the dimensions of each view.
View  Mfeatures  Movies 617  MSRCV1  Olympics  ORL  Yale Face 

FAC(216)  Keywords(1878)  CENT(1302)  Followedby(464)  Intensity(4096)  Intensity(4096)  
FOU(76)  Actors(1398)  CMT(48)  Followes(464)  LBP(3304)  LBP(3304)  
KAR(64)    GIST(512)  Listmerrged(3097)  Gabor(6750)  Gabor(6750)  
MOR(6)    HOG(100)  List(4942)      
PIX(260)    LBP(256)  Mensionedby(464)      
ZER(47)    SIFT(210)  Mensions(464)      
      Retweets(464)      
      Retweetedby(464)      
      Tweets(18455)      
# samples  2000  617  210  464  400  165 
# cluster  10  17  7  28  40  15 
Additionally, three metrics are employed in this section to evaluate the clustering performance, including NMI (normalized mutual information), ACC (accuracy), and Fscore, which are commonly used metric in multiview clustering. It should be noted that the higher value of each metric corresponds the better clustering performance. All parameters of each competed approach are finetuned, and results of LMSC conducted on the ORL and MSRCV1 datasets are provided by the author of LSMC. To eliminate the randomness, 30 Monte Carlo (MC) trials were conducted with respect to each data set for all competed approaches. Experimental results are reported in form of the mean value with respective of NMI, ACC and Fscore.
VB Validation Experiments
To validate the proposed FCMSC, experiments were conducted on each single view and the joint view representation of each dataset, respectively. More specifically, FCMSC was tested on three random selected datasets, including MSRCV1, ORL, and Yale Face. Fig. 4 displays experimental results of three datasets for each single view and the joint view representation.
As shown in Fig. 4, the performance of FCMSC with the joint view representation is much better than those of the single view. Taking MSRCV1 as example, the joint view representation can improve the performance of FCMSC with respect to NMI, ACC, and Fscore. Since both samplespecific error and clusterspecific are considered in the optimization problem, the proposed approach can take advantage of both the consensus and complementary information of multiple views, so obtain the promising clustering performance. Therefore, the proposed FCMSC is valid and can obtain promising performance for multiview clustering.
VC Comparison Experiments
To demonstrate its effectiveness, the proposed FCMSC is compared with eight approaches listed as follows:

: Spectral Clustering of Best Single View. Spectral clustering is conducted on each single view and we report the results of the view with the best clustering performance.

: Spectral Clustering on Concatenated Features. Features of multiple views are concatenated, and then spectral clustering algorithm is applied on the joint view representation.

: LowRank Representation of Best Single View. Similar to the , lowrank representation algorithm is conducted on each view, and the results of the view with best clustering performance are reported.

: LowRank Representation of Concatenated Features. We apply the lowrank representation algorithm to the joint view representation to get multiview clustering results.

Kernel Addition: This approach combines information of all view by averaging the sum of kernel matrices of all views, then the standard spectral clustering is used to obtain the clustering results.

Coreg: Coregularized multiview spectral clustering[25]. This approach pursuits consistent of all views during clustering procedure.

RMSC: Robust Multiview Subspace Clustering[26]. RMSC recovers a shared lowrank transition probability matrix via lowrank and sparse decomposition, and then applies the spectral clustering via Markov chains to obtain clustering performance.

LMSC: Latent Multiview Subspace Clustering[19]. LMSC learns a latent multiview latent representation and performs data reconstruction based on the learned representation simultaneously.
Experimental results are reported in form of the mean score, as well as the standard deviation with the respective of NMI, ACC and Fscore. Table
III displays results of all completed approaches for six real datasets.As shown in Table III, on all six datasets, the proposed FCMSC achieves the best performance on among all competed approaches with respect to all the clustering metrics. Here lists some statistics. On MSRCV1 with six views, the results of FCMSC indicate a relative increase of 13.07%, 5.61%, and 8.06% w.r.t NMI, ACC, and Fscore, respectively compared with the corresponding second best approach. On Yale Face with thress views, FCMSC displays 14.07%, 17.30%, and 30.65% of relative improvement w.r.t NMI, ACC and Fscore over the corresponding econd best approach, respectively.
Dataset  Method  NMI  ACC  FSCORE 

Mfeatures  0.6506(0.0076)  0.6969(0.0143)  0.6001(0.0108)  
0.6900(0.0029)  0.7077(0.0081)  0.6373(0.0048)  
0.6971(0.0028)  0.7601(0.0071)  0.6626(0.0043)  
0.7381(0.0009)  0.7951(0.0006)  0.7089(0.0010)  
Kernel Addition  0.7730(0.0082)  0.7942(0.0185)  0.7247(0.0129)  
Coreg  0.7273(0.0079)  0.7690(0.0158)  0.6919(0.0122)  
RMSC  0.7412(0.0062)  0.7662(0.0178)  0.6867(0.0111)  
LMSC  0.8106(0.0186)  0.8212(0.0550)  0.7739(0.0364)  
FCMSC  0.8330(0.0014)  0.8943(0.0016)  0.8120(0.0023)  
Movies 617  0.2606(0.0020)  0.2579(0.0035)  0.1481(0.0025)  
0.2668(0.0017)  0.2604(0.0033)  0.1542(0.0019)  
0.2667(0.0059)  0.2747(0.0071)  0.1545(0.0047)  
0.2839(0.0075)  0.2824(0.0135)  0.1813(0.0063)  
Kernel Addition  0.2917(0.0026)  0.2901(0.0049)  0.1764(0.0033)  
Coreg  0.2454(0.0018)  0.2396(0.0017)  0.1381(0.0016)  
RMSC  0.2957(0.0032)  0.2971(0.0040)  0.1810(0.0028)  
LMSC  0.2813(0.0098)  0.2747(0.0094)  0.1606(0.0068)  
FCMSC  0.3195(0.0085)  0.3108(0.0101)  0.1932(0.0064)  
MSRCV1  0.6047(0.0112)  0.6826(0.0171)  0.5724(0.0122)  
0.4398(0.0021)  0.5073(0.0077)  0.3978(0.0032)  
0.5704(0.0054)  0.6732(0.0091)  0.5368(0.0076)  
0.0667(0.0035)  0.1851(0.0056)  0.2363(0.0026)  
Kernel Addition  0.6176(0.0087)  0.7102(0.0130)  0.5973(0.0097)  
Coreg  0.6583(0.0106)  0.7674(0.0169)  0.6459(0.0128)  
RMSC  0.6696(0.0064)  0.7819(0.0125)  0.6614(0.0093)  
LMSC  0.6534(0.0105)  0.8055(0.0128)  0.6517(0.0171)  
FCMSC  0.7571(0.0043)  0.8507(0.0023)  0.7420(0.0029)  
Olympics  0.7617(0.0046)  0.6288(0.0112)  0.5178(0.0134)  
0.5625(0.0038)  0.4610(0.0078)  0.3194(0.0085)  
0.8674(0.0038)  0.7830(0.0093)  0.7112(0.0088)  
0.1762(0.0133)  0.1669(0.0078)  0.1036(0.0025)  
Kernel Addition  0.7245(0.0038)  0.6093(0.0073)  0.5189(0.0071)  
Coreg  0.8308(0.0027)  0.7341(0.0071)  0.6707(0.0079)  
RMSC  0.7573(0.0063)  0.6372(0.0108)  0.5687(0.0117)  
LMSC  0.8902(0.0065)  0.8043(0.0140)  0.7814(0.0154)  
FCMSC  0.8925(0.0056)  0.8279(0.0157)  0.7953(0.0209)  
ORL  0.8868(0.0069)  0.7459(0.0121)  0.6805(0.0159)  
0.8084(0.0027)  0.6323(0.0061)  0.5236(0.0069)  
0.8477(0.0096)  0.7192(0.0200)  0.6114(0.0235))  
0.8497(0.0085)  0.7178(0.0190)  0.6119(0.0218)  
Kernel Addition  0.8028(0.0033)  0.6349(0.0074)  0.5224(0.0061)  
Coreg  0.8277(0.0040)  0.6653(0.0080)  0.5672(0.0092)  
RMSC  0.8885(0.0056)  0.7482(0.0128)  0.6866(0.0139)  
LMSC  0.9310(0.0116)  0.8194(0.0171)  0.7583(0.0091)  
FCMSC  0.9329(0.0077)  0.8374(0.0307)  0.7885(0.0299)  
Yale Face  0.6229(0.0354)  0.5715(0.0497)  0.4319(0.0472)  
0.5761(0.0335)  0.5145(0.0460)  0.3653(0.0420)  
0.6920(0.0212)  0.6634(0.0256)  0.4964(0.0337)  
0.6917(0.0190)  0.6667(0.0236)  0.4941(0.0303)  
Kernel Addition  0.5872(0.0320)  0.5352(0.0397)  0.3823(0.0390)  
Coreg  0.6146(0.0084)  0.5638(0.0108)  0.4208(0.0110)  
RMSC  0.6590(0.0108)  0.6091(0.0161)  0.4773(0.0133)  
LMSC  0.7073(0.0105)  0.6758(0.0116)  0.5138(0.0172)  
FCMSC  0.8068(0.0129)  0.7927(0.0264)  0.6713(0.0236) 
Since most exiting approaches blindly combine the consensus and complementary information from multiview data with possible considerable noise, there are difficult to obtain promising performance for multiview clustering. For the joint view representation, it may be difficult to obtain promising clustering results by directly running spectral clustering algorithm or lowrank representation method, even may be worse than that of single view, taking MSRCV1, Olympics, and Yale Face datasets for example. Since each view has its own specific properties that may be contrary to other views, it is difficult to explore and utilize the consensus and complementary information of all views by performing some existing singleview clustering approaches on the concatenated features. To analyze the reason in a more intuitive manner, taking ORL and Yale Face databases as examples, Fig. 5 illustrate the visualization of adjacency matrices, which are calculated from each view or concatenated data by Gaussian kernel for spectral clustering. From the comprehensive comparison, Fig. 5 also displays the visualization of coefficient matrices and , which are calculated from concatenated data by the lowrank representation and the proposed FCMSC, respectively. As shown in Fig. 5, adjacency matrices calculated by Gaussian kernel are more likely corrupted by noises. While, the coefficient matrices and are less likely corrupted by noises. Between these two coefficient matrices, is more suitable than for multiview clustering. By applying spectral clustering to , the proposed FCMSC can obtain promising performance for multiview clustering.
VD Parameters Sensitivity
In FCMSC, there are four parameters required to be finetuned, i.e. , , , and . To achieve good clustering performance, these parameters require to be set empirically. The key question arising here is whether the performance of our approach is sensitive to these parameters.
To answer this question, experiments are conducted on MSRCV1 datasets to observe the effects on clustering performance with different values of each parameters. Experimental results are reported in metrics of ACC and NMI, with one parameter variant and other parameters fixed. More specifically, we first fix , and to be 0.1, 0.05 and 0.1, respectively, and then tune from 0.01 to 1. Fig. 6 shows the corresponding results. With and , Fig. 7 illustrates the clustering performance of the proposed approach based on different values of and . What’s more, Fig. 8 presents the clustering results of the proposed approach for variant , with , , and . Obviously, when we set , the performance of our approach only has small variations as long as , , and are chosen in a suitable range, i.e., when we set and to be no smaller than 0.5, to be no larger than 0.05.
In summary, with , FCMSC is relatively insensitive to other parameters , and as long as these parameters are chosen from a suitable range. This makes FCMSC easy to apply without much effort for parameter tuning.
Vi Conclusion
This paper proposes a feature concatenation multiview subspace clustering approach, termed FCMSC. Different from most of existing approaches, we perform a novel clustering strategy on the concatenated features to explore both the consensus and complementary information among multiple views. By taking both specificerror and clusterspecific error into consideration, the coefficient matrix can be recovered from concatenated features for clustering with promising performance. Extensive experiments on realword datasets demonstrate the superiority of our approach over some stateoftheart approaches.
Despite effective the FCSMC is, it is time consuming due to the operation of matrix inversion and SVD decomposition involved in the optimization, especially when the dimension of joint view representation is high. Further work will focus on the improvement of proposed approach for largescale data, by employing the dimensionality reduction and the binary representation[43] strategies.
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No. 61573273.
References
 [1] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multiview learning. arXiv preprint arXiv:1304.5634, 2013.
 [2] Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. Crossdomain feature learning in multimedia. IEEE Transactions on Multimedia, 17(1):64–78, 2015.
 [3] Lei Zhang and David Zhang. Visual understanding via multifeature shared learning with global consistency. IEEE Transactions on Multimedia, 18(2):247–259, 2016.
 [4] Aude Oliva and Antonio Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145–175, 2001.
 [5] David G Lowe. Distinctive image features from scaleinvariant keypoints. International journal of computer vision, 60(2):91–110, 2004.
 [6] Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 24(7):971–987, 2002.
 [7] Kun Zhan, Chaoxi Niu, Changlu Chen, Feiping Nie, Changqing Zhang, and Yi Yang. Graph structure fusion for multiview clustering. IEEE Transactions on Knowledge and Data Engineering, 2018.
 [8] Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. Multiview learning overview: Recent progress and new challenges. Information Fusion, 38:43–54, 2017.
 [9] Shiliang Sun and Guoqing Chao. Multiview maximum entropy discrimination. In IJCAI, pages 1706–1712, 2013.
 [10] Tao Zhou, Changqing Zhang, Chen Gong, Harish Bhaskar, and Jie Yang. Multiview latent space learning with feature redundancy minimization. IEEE Transactions on Cybernetics, 2018.
 [11] Yue Gao, Yi Zhen, Haojie Li, and TatSeng Chua. Filtering of brandrelated microblogs using socialsmooth multiview embedding. IEEE Trans. Multimedia, 18(10):2115–2126, 2016.
 [12] ZhiHua Zhou. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012.
 [13] René Vidal. Subspace clustering. IEEE Signal Processing Magazine, 28(2):52–68, 2011.
 [14] René Vidal and Paolo Favaro. Low rank subspace clustering (lrsc). Pattern Recognition Letters, 43:47–61, 2014.
 [15] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by lowrank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):171–184, 2013.
 [16] Ehsan Elhamifar and Rene Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765–2781, 2013.
 [17] Vishal M Patel, Hien Van Nguyen, and René Vidal. Latent space sparse and lowrank subspace clustering. IEEE Journal of Selected Topics in Signal Processing, 9(4):691–701, 2015.
 [18] Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. Smooth representation clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3834–3841, 2014.
 [19] Changqing Zhang, Qinghua Hu, Huazhu Fu, Pengfei Zhu, and Xiaochun Cao. Latent multiview subspace clustering. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4333–4341. IEEE, 2017.
 [20] Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, and Xiaochun Cao. Lowrank tensor constrained multiview subspace clustering. In Proceedings of the IEEE international conference on computer vision, pages 1582–1590, 2015.
 [21] Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. Diversityinduced multiview subspace clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–594, 2015.
 [22] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multiview subspace clustering. In Proceedings of the IEEE international conference on computer vision, pages 4238–4246, 2015.
 [23] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. Iterative views agreement: An iterative lowrank based structured optimization method to multiview spectral clustering. arXiv preprint arXiv:1608.05560, 2016.

[24]
Abhishek Kumar and Hal Daumé.
A cotraining approach for multiview spectral clustering.
In
Proceedings of the 28th International Conference on Machine Learning (ICML11)
, pages 393–400, 2011.  [25] Abhishek Kumar, Piyush Rai, and Hal Daume. Coregularized multiview spectral clustering. In Advances in neural information processing systems, pages 1413–1421, 2011.
 [26] Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multiview spectral clustering via lowrank and sparse decomposition. In AAAI, pages 2149–2155, 2014.
 [27] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for lowrank representation. In Advances in neural information processing systems, pages 612–620, 2011.
 [28] Chang Tang, Xinzhong Zhu, Xinwang Liu, Miaomiao Li, Pichao Wang, Changqing Zhang, and Lizhe Wang. Learning joint affinity graph for multiview subspace clustering. IEEE Transactions on Multimedia, 2018.
 [29] Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multiview clustering. arXiv preprint arXiv:1712.06246, 2017.
 [30] Xing Yi, Yunpeng Xu, and Changshui Zhang. Multiview em algorithm for finite mixture models. In International Conference on Pattern Recognition and Image Analysis, pages 420–425. Springer, 2005.

[31]
Grigorios F Tzortzis and Aristidis C Likas.
Multiple view clustering using a weighted combination of
exemplarbased mixture models.
IEEE Transactions on neural networks
, 21(12):1925–1938, 2010. 
[32]
YuMeng Xu, ChangDong Wang, and JianHuang Lai.
Weighted multiview clustering with feature selection.
Pattern Recognition, 53:25–35, 2016.  [33] Maria Brbić and Ivica Kopriva. Multiview lowrank sparse subspace clustering. Pattern Recognition, 73:247–258, 2018.
 [34] Handong Zhao, Zhengming Ding, and Yun Fu. Multiview clustering via deep matrix factorization. In AAAI, pages 2921–2927, 2017.
 [35] Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multiview clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 252–260. SIAM, 2013.
 [36] Daniel D Lee and H Sebastian Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401(6755):788, 1999.
 [37] Kamalika Chaudhuri, Sham M Kakade, Karen Livescu, and Karthik Sridharan. Multiview clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning, pages 129–136. ACM, 2009.
 [38] Guoqing Chao and Shiliang Sun. Multikernel maximum entropy discrimination for multiview learning. Intelligent Data Analysis, 20(3):481–493, 2016.
 [39] Tong Zhang, Alexandrin Popescul, and Byron Dom. Linear prediction models with graph regularization for webpage categorization. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 821–826. ACM, 2006.
 [40] Dongyan Guo, Jian Zhang, Xinwang Liu, Ying Cui, and Chunxia Zhao. Multiple kernel learning based multiview spectral clustering. In International Conference on Pattern Recognition, 2014.
 [41] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
 [42] Richard H Bartels and G. W Stewart. Solution of the matrix equation ax+xb=c [f4] (algorithm 432). Communications of the Acm, 15(9):820–826, 1972.
 [43] Zheng Zhang, Li Liu, Fumin Shen, Heng Tao Shen, and Ling Shao. Binary multiview clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99):1–1, 2018.