Feature Concatenation Multi-view Subspace Clustering

01/30/2019 ∙ by Qinghai Zheng, et al. ∙ Xi'an Jiaotong University 14

Many multi-view clustering methods have been proposed with the popularity of multi-view data in variant applications. The consensus information and complementary information of multi-view data ensure the success of multi-view clustering. Most of existing methods process multiple views separately by exploring either consensus information or complementary information, and few methods cluster multi-view data based on concatenated features directly since statistic properties of different views are diverse, even incompatible. This paper proposes a novel multi-view subspace clustering method dubbed Feature Concatenation Multi-view Subspace Clustering (FCMSC), which uses the joint view representation of multi-view data to obtain the clustering performance straightforward and leverage both the consensus information and complementary information. Specifically, multiple views are concatenated firstly, then a special coefficient matrix, enjoying the low-rank property, is derived and the spectral clustering algorithm is applied to an affinity matrix calculated from the coefficient matrix. It is notable that the coefficient matrix obtained during clustering process is not derived by applying Low-Rank Representation (LRR) to the joint view representation simply. Furthermore, l_2,1-norm and sparse constraints are introduced to deal with the sample-specific and cluster-specific corruptions of multiple views for benefitting the clustering performance. A novel algorithm based on the Augmented Lagrangian Multiplier (ALM) is designed to optimize the proposed method. Comprehensive experiments compared with several effective multi-view clustering methods on six real-world datasets show the superiority of the proposed work.



page 1

page 3

page 4

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Multi-view data, obtained from different measurements or collected from various fields to describe objects comprehensively, are widespread in many real-world applications[1],[2],[3]

. For example, in computer vision tasks, an image can be described by multiple view representations (GIST

[4], SIFT[5], LBP[6], etc.); the words presented on a webpage and the words presented in URL are two distinct views of the webpage; video signals and audio signals are two common representations and can be applied for multimedia content understanding. Compared with single-view data, multi-view data contain both the consensus and complementary information among multiple views. And the goal of multi-view learning, which has achieved success in many applications[1],[7],[8],[9],[10],[11], is to improve the generalization performance by leveraging multiple views.

As a fundamental task in unsupervised learning


, clustering can be used as a stand-alone exploratory tool to mine the intrinsic structure of data or a preprocessing stage to assist other learning tasks as well. Many clustering approaches have been proposed, and subspace clustering, which assumes that high dimensional data lie in a union low-dimensional subspaces and tries to group data points into clusters and find the corresponding subspace simultaneously, attracts lots of researches owing to its promising performance and good interpretability

[13]. Variant clustering algorithms based on the subspace clustering have been proposed within different constraints. Low-Rank Subspace Clustering (LRSC)[14] finds a low-rank linear representation of data in a dictionary of themselves and then employs the spectral clustering on an adjacent matrix, which is derived from the low-rank representation[15], to obtain clustering results. Besides, Sparse Subspace Clustering (SSC)[16], which tries to find a sparsest representation based on the -norm, is a powerful subspace clustering algorithms as well. Low-Rank Sparse Subspace Clustering (LRSSC)[17] applies low-rank and sparse constraints simultaneously based on the trace norm and -norm according to the fact that the coefficient matrix is often sparse and low-rank at the same time. Additionally, Smooth Representation Clustering[18] explores the grouping effect for subspace clustering. Although these algorithms are effective in practice, they are designed for single-view data rather than multi-view data.

Based on the subspace clustering, many multi-view subspace clustering approaches have been proposed[19],[20],[21],[22],[23], most of which process multiple views separately and then find a common shared coefficient matrix or fuse clustering results of different views to maximum the consensus among all views. Although good performance has been achieved in practice, the consensus information and complementary of multi-view data cannot be explored effectively, since these methods insufficiently describe data within each view separately. In order to make full use of multiple views, we propose a novel multi-view subspace clustering dubbed Feature Concatenation Multi-view Subspace Clustering (FCMSC) in this paper. Concatenating features of all views and then running the single-view clustering algorithm directly is a native solution for multi-view clustering, however, it is ineffective in most real-world applications and even gets worse clustering results[1],[8],[24],[25],[26]. It is noteworthy that our proposed FCMSC can achieve compromising clustering performance conducted on the concatenated features straightforward by exploring both the consensus and complementary information. Specifically, by reducing the sample-specific error among data points and cluster-specific error among multiple views simultaneously, out method decomposes the coefficient matrix, which is calculated based on the joint view representation and enjoys the low-rank property, into a new low-rank coefficient matrix without corrupted information and a cluster-specific noise matrix. Moreover, an optimization algorithm based on the Augmented Lagrangian Multipliers (ALM)[27] is proposed for the objective function of the FCMSC. Extensive experiments on six real-world datasets compared with several state-of-the-art multi-view clustering approaches show the effectiveness and competiveness of our proposed method.

In summary, the main contributions of this paper can be delivered as follows:

  • We propose an effective multi-view subspace clustering framework that can achieve promising clustering results based on the concatenated features straightforward. To the best of our knowledge, our approach is the first one that can directly process the joint view representation with competitive clustering results.

  • Both sample-specific error and cluster-specific error are taken into account by introducing reasonable constraints and regularizers. Different from most of existing approaches, the proposed approach can explore both the consensus and the complementary information among multiple views.

  • Experiments are conducted on public available datasets so as to demonstrate the super performance of the proposed approach for multi-view clustering.

The rest of this paper is organized as follows. The next section reviews related works briefly. Section III introduces our approach and Section IV displays the related optimization in detail. Comprehensive experimental results and discussions are provided in Section V. Finally, we will give the conclusions and future works in Section VI.

Ii related works

Recently, a lot of approaches have been proposed to solve the multi-view clustering problem[7],[19],[20],[21],[22],[23],[24],[25],[26],[28],[29],[30],[31],[32],[33],[34],[35]. Existing methods can be grouped into two main categories roughly: generative methods and discriminative methods[29]. Generative methods try to construct generative models for different clusters respectively. For example, multi-view convex mixture models[31] learn different weights for multiple views automatically and build convex mixture models for multiple views. Although most generative algorithms are robust to the missing entry and even have global optimizations, they are accompanied with a series of hypotheses and parameters, which make the optimization more difficult and time consuming. Discriminative methods, which try to minimize the similarities of data points between clusters and maximize the similarities of data points within clusters directly through all multiple views, have achieve good clustering results in many applications and attracts most attention of research in this field. Taking example of multi-view subspace clustering, Latent Multi-view Subspace Clustering (LMSC)[19]

, which achieves the current state-of-the-art multi-view clustering performance, explores underlying complementary information and seeks the latent representation simultaneously; Low-rank Tensor constrained Multi-view Subspace Clustering (LT-MSC)


formulates the clustering problem as a tensor nuclear norm minimization problem by regarding the subspace representation matrices of multiple views as a tensor; Diversity-induced Multi-view Subspace Clustering (DiMSC)

[21] enhances the multi-view clustering performance by utilizing the Hilbert Schmidt Independence Criterion to explore the complementary information of different views; Multi-view subspace clustering by learning a joint affinity graph[28] leverages a low-rank representation with diversity regularization and a rank constraint to learn a joint affinity graph for clustering, and the Multi-View Subspace Clustering (MVSC)[22] uses a shared common cluster structure of all views to obtain clustering results by exploring the consensus information among views; Iterative Views Agreement[23] is a multi-view subspace clustering approach, which can preserve the local manifold structures of each view during multi-view clustering process. Besides, many spectral clustering based methods are proposed as well. The co-training approach for multi-view spectral clustering[24] and the co-regularized multi-view spectral clustering[25] process multiple views separately and try to get clustering results that can maximize the agreement among views; Robust Multi-view Spectral Clustering (RMSC)[26]

recovers a common transition probability matrix via low-rank and sparse decomposition and implies Markov chain method to obtain clustering results. In addition, several multi-view clustering methods based on the matrix factorization approach

[36] are proposed by exploring the consensus information among views[34],[35].

Among variant discriminative multi-view clustering methods, the essential difference is the style they use to explore the consensus information and the complementary information of multiple views. And most of existing multi-view clustering methods process multiple views separately. It is natural to combine multiple views before clustering operation, and some related approaches have been proposed[37],[38],[39],[40]. Multi-view clustering via Canonical Correlation Analysis (CCA)[37] gets the combination after projection; Methods proposed in[38],[39],[40] use a kernel to combine multiple views. However, these methods may corrupt either the consensus information or the complementary information among views during combination to varying degrees. Since it is difficult to explore effective information among views based on the direct combination of multiple views, multi-view clustering results by applying single-view clustering algorithm to the joint view representation are uncompetitive[20],[21],[25],[26],[29], and few works focus on this kind of combination styles. However, original information contained among multiple views can get maximum preservation by concatenating features of all views directly. It is notable that the proposed FCMSC is the first method that can get promising clustering performance by utilizing the concatenated features of multiple views straightforward.

Iii feature concatenation multi-view subspace clustering

Symbol Meaning
The number of samples.
The number of views.
The number of clusters.
The dimension of features in -th view.
The dimension of the concatenated features, .
The features of -th sample from -th view.
The joint view representation matrix.
The coefficient matrix of .
The sample-specific error.
The coefficient matrix of .
The cluster-specific error.
The Laplacian matrix of -th view.
The -norm of matrix .
The trace-norm of matrix .
The -norm of matrix .
The transpose of matrix .
rank() The rank function.
abs() The absolute function.
TABLE I: Main Symbols
Fig. 1: Illustration of the proposed FCMSC, in which multi-view subspace clustering is implemented on the concatenated features. Both sample-specific error and cluster-specific error are taken into consideration simultaneously during clustering.

For convenience, Table I lists main symbols used throughout this paper. Given a multi-view dataset with views and samples, i.e. , data points of which are drawn from multiple subspaces. In order to obtain a matrix that each column has the same magnitude, data of each view are normalized within the range of , and then multiple views are concatenated into a joint view representation matrix X, which is defined as follows,


where denotes the features of -th sample from -th view, and -th column of contains features of all views of -th sample. Based on concatenated features, Fig. 1 displays the whole framework of the proposed FCMSC.

Since statistic properties of different views are diverse and even may be incompatible among views, it is difficult to fully explore the mutual information of multiple views. In order to preliminary explore the mutual information, we consider the following objective function in the beginning:


where indicates a coefficient matrix of and denotes the sample-specific error of data points, is a trade-off parameter and the -norm of enforces columns of to be zero[15]. This is a standard low-rank representation of the concatenated features. However, experimental results will shows that the multi-view clustering performance will not be competitive if we directly applying the spectral clustering algorithm to . This is because each view has specific statistical properties, which may be contradictory among views, and it is unreasonable to explore the joint views representation by directly applying single-view clustering algorithm.

Usually, the coefficient matrix obtained above is accompanied with the cluster-specific error. Besides, it is far from good enough for clustering. Motivated by the subspace representation with a special dictionary matrix, we introduce the following formulation:


where indicates the dictionary matrix, denotes the coefficient matrix which contains the consensus and complementary information without cluster-specific error, and represents the cluster-specific error among multiple views. Obviously, the choice of is vital for the performance of clustering.

Fig. 2: Visualization of coefficient matrices obtained from the Yale Face dataset. is presented for matrix , and the same to matrix .

Since matrix is free of the sample-specific error, it is reasonable to view the reconstructed features obtained from the low-rank representation (2), i.e. , as the dictionary matrix. Besides, it is difficult to directly process the cluster-specific error caused by multiple views. Therefore, we can decompose it as and reformulate Eq. (3) as follows:


For simplicity, we can reformulate the above equation as follows:


where denotes the cluster-specific error as well.

It is straightforward to design the following objective function:


where and are trade-off parameters. Although and are both imposed by the -norm constraint, they are totally different in essence. More specifically, denotes the sample-specific error, and is employed to decrease the cluster-specific error caused by multiple-views. Theoretically, both the coefficient matrix and are suitable for clustering. To view their difference in a more intuitive way, Fig.2 displays the visualization of and calculated from Yale Face dateset111The Yale Face database contains 165 grayscale images in GIF format of 15 individuals. More details will be presented in the section of experiment.. As shown in Fig.2, the underlying structure of is more suitable than that of for clustering.

Fig. 3: Comparison of the low-rank representation and our proposed FCMSC. The black rectangle indicates the ground-truth clustering, and the red rectangle indicates the clustering achieved by the corresponding method. For Concatenated features, the low-rank constraint can not fully explore the complementary information and is unable to achieve good clustering results. While, by introducing both sample-specific error and cluster-specific error , the proposed FCMSC can explores both the consensus and complementary information effectively and obtain the promising clustering performance.

Additionally, to preserve the local nonlinear manifold of multiple views, a graph Laplacian regularizer should be considered in our method:


where denotes the transpose of , represents the graph Laplacian matrix of -th view, , and is the degree matrix of -th view, is the adjacency matrix of -th view[41]. Moreover, the item denoted as the sparse constraint of , is also introduced in our method, so as to extract the local manifold of data and preserve the relationship of the local sparse representative neighborhood of each sample discriminatively.

Finally, the objective function of FCMSC can be formulated as follows:


where , , , and are trade-off parameters. Once the coefficient matrix is learned, we construct the adjacency matrix for spectral clustering as follows:


where abs() denotes the absolution function, which can deal with a matrix and return the absolute value of each element in the matrix.

To understand the proposed FCMSC intuitively, Fig. 3 demonstrate difference between the low-rank representation and this approach. As shown in Fig. 3, both the consensus and the complementary information are explored by the proposed FCMSC, which can achieve multi-view clustering with good performance.

Iv Optimization

In this section, we will present the optimization of the objective function in detail, and then analysis the computational complexity and convergence.

Iv-a Algorithm of the FCMSC

We apply the Alternating Direction Minimization strategy based on the Augmented Lagrangian Multiplier (ALM)[27] to solve the objective function. Moreover, two auxiliary variables, and , are introduced, so that the objective function can be separable and reformulated equivalently as follows.


The corresponding ALM problem is:


where , , and are Laplacian multipliers, is a positive adaptive penalty parameter, denotes the trace of .

The Alternating Direction Minimization strategy is employed to the minimization of ALM problem, thus, the whole problem is decomposed into several subproblems, which can be optimized effectively.

Iv-A1 Updating

To update with other variable fixed, we solve the following problem:


which can be optimized in a closed form.

Specifically, the solution of the above subproblem is supposed as , that we can get the following formulation:


where denotes the -th column of the matrix , and .

Iv-A2 Updating

The subproblem of updating , in which other variables are all fixed, can be written as follows:


The optimization is similar to the subproblem of updating .

Iv-A3 Updating

With other variables being fixed, we solve the following problem to update variable :


which can be optimized by applying the singular value threshold approach

[27]. To be specific, by setting

and performing singular value decomposition on

, i.e. , we can obtain the optimization of :


where denotes a soft-thresholding operator as following and can be extended to matrices by applying it element-wise.


Iv-A4 Updating

The auxiliary variable can be updated by solving the following problem:


and the optimization of which can be written as follows:


Iv-A5 Updating

When other variables are fixed, the subproblem with respect to can be written as


In order to get the optimization, we take the derivative of above function with respect to and then let the derivative to be 0. The optimization of is



is an identity matrix with proper size. It is noteworthy that the local structure of multiple views are taken into consideration when updating


Iv-A6 Updating

With other variables being fixed, the subproblem of updating can be written as follows:


Accordingly, taking derivative with respect to Z and then let the derivate to be 0, we obtain the following equivalent equation, solution of which is the optimization of this subproblem:


where , , and can be written as follows:


The above equation is a Sylvester equation and can be solved referring to[42].

Iv-A7 Updating Lagrange multipliers and

According to[27], we update Lagrange multipliers and μ as following:


where and the parameter is monotonically increased by until reaching the maximum, .

Algorithm 1 outlines the whole procedure of optimization. It is should be noticed that we random initialize in practice to avoid all zeros solutions.

   Multi-view data ;
   , , , , , , ,
   , , , , ,
   Initialize Z with random values;
   , , , ;

   Update variables , , , , , according to the subproblem 1-6;
   Update multipliers , , , and according to the subproblem 7;

Algorithm 1 Optimization of the proposed FCMSC

Iv-B Computational Complexity and Convergence

As shown in Algorithm 1, the main computational burden is composed of six parts, i.e. the six subproblems. The complexity of updating is , and the complexity of updating is , both of which are matrix multiplication. As for the subproblem of updating , the complexity is . The complexity of updating is . In the subproblem of updating , the complexity is , since matrix inversion is included during optimization process. For updating , Sylvester equation is optimized, and the complexity of this subproblem is . To sum up, the computational complexity of each iteration is , and in practice, is much larger that , and the complexity is .

For convergence analysis, unfortunately, we find that it is difficulty to give any solid proof on the convergence of the proposed algorithm. However, extensive experimental results on the real-world datasets show that our algorithm can converge effectively with all-zero initialization except for variable , which is initialized with random information.

V Experiments

In this section, comprehensive experiments are conducted on several real datasets. Accordingly, experimental results are presented with corresponding analyses. All codes were implemented in Matlab on a desktop with four-core 3.6GHz processor and 8GB of memory.

V-a Experimental Settings

To evaluate the performance of proposed approach, we employ six real-world datasets in experiments, including Mfeatures111http://archive.ics.uci.edu/ml/datasets/Multiple+Features, Movies 617222http://lig-membres.imag.fr/grimal/data/movies617.tar.gz, MSRCV1333http://research.microsoft.com/en-us/projects/objectclassrecognition/, Olympics444http://mlg.ucd.ie/aggregation/, ORL555https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html, and Yale Face666http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html. Specifically, Table II lists the number of samples, cluster, as well as the dimensions of each view.

View Mfeatures Movies 617 MSRCV1 Olympics ORL Yale Face
FAC(216) Keywords(1878) CENT(1302) Followedby(464) Intensity(4096) Intensity(4096)
FOU(76) Actors(1398) CMT(48) Followes(464) LBP(3304) LBP(3304)
KAR(64) - GIST(512) Listmerrged(3097) Gabor(6750) Gabor(6750)
MOR(6) - HOG(100) List(4942) - -
PIX(260) - LBP(256) Mensionedby(464) - -
ZER(47) - SIFT(210) Mensions(464) - -
- - - Retweets(464) - -
- - - Retweetedby(464) - -
- - - Tweets(18455) - -
# samples 2000 617 210 464 400 165
# cluster 10 17 7 28 40 15
TABLE II: Details of benchmark datasets
Fig. 4: Clustering results of FCMSC with each single view and the joint view representation. It can be seen that the clustering results with the joint view representation is much better that those of each single view.

Additionally, three metrics are employed in this section to evaluate the clustering performance, including NMI (normalized mutual information), ACC (accuracy), and F-score, which are commonly used metric in multi-view clustering. It should be noted that the higher value of each metric corresponds the better clustering performance. All parameters of each competed approach are fine-tuned, and results of LMSC conducted on the ORL and MSRCV1 datasets are provided by the author of LSMC. To eliminate the randomness, 30 Monte Carlo (MC) trials were conducted with respect to each data set for all competed approaches. Experimental results are reported in form of the mean value with respective of NMI, ACC and F-score.

V-B Validation Experiments

To validate the proposed FCMSC, experiments were conducted on each single view and the joint view representation of each dataset, respectively. More specifically, FCMSC was tested on three random selected datasets, including MSRCV1, ORL, and Yale Face. Fig. 4 displays experimental results of three datasets for each single view and the joint view representation.

As shown in Fig. 4, the performance of FCMSC with the joint view representation is much better than those of the single view. Taking MSRCV1 as example, the joint view representation can improve the performance of FCMSC with respect to NMI, ACC, and F-score. Since both sample-specific error and cluster-specific are considered in the optimization problem, the proposed approach can take advantage of both the consensus and complementary information of multiple views, so obtain the promising clustering performance. Therefore, the proposed FCMSC is valid and can obtain promising performance for multi-view clustering.

V-C Comparison Experiments

Fig. 5: Visualization of adjacency matrices (a, b, c, d, g, h, i, and j) and coefficient matrices (e, f, k, and l), where and are both displayed in the form of (abs()+abs())/2 and (abs()+abs())/2 .

To demonstrate its effectiveness, the proposed FCMSC is compared with eight approaches listed as follows:

  • : Spectral Clustering of Best Single View. Spectral clustering is conducted on each single view and we report the results of the view with the best clustering performance.

  • : Spectral Clustering on Concatenated Features. Features of multiple views are concatenated, and then spectral clustering algorithm is applied on the joint view representation.

  • : Low-Rank Representation of Best Single View. Similar to the , low-rank representation algorithm is conducted on each view, and the results of the view with best clustering performance are reported.

  • : Low-Rank Representation of Concatenated Features. We apply the low-rank representation algorithm to the joint view representation to get multi-view clustering results.

  • Kernel Addition: This approach combines information of all view by averaging the sum of kernel matrices of all views, then the standard spectral clustering is used to obtain the clustering results.

  • Co-reg: Co-regularized multi-view spectral clustering[25]. This approach pursuits consistent of all views during clustering procedure.

  • RMSC: Robust Multi-view Subspace Clustering[26]. RMSC recovers a shared low-rank transition probability matrix via low-rank and sparse decomposition, and then applies the spectral clustering via Markov chains to obtain clustering performance.

  • LMSC: Latent Multi-view Subspace Clustering[19]. LMSC learns a latent multi-view latent representation and performs data reconstruction based on the learned representation simultaneously.

Experimental results are reported in form of the mean score, as well as the standard deviation with the respective of NMI, ACC and F-score. Table

III displays results of all completed approaches for six real datasets.

As shown in Table III, on all six datasets, the proposed FCMSC achieves the best performance on among all competed approaches with respect to all the clustering metrics. Here lists some statistics. On MSRCV1 with six views, the results of FCMSC indicate a relative increase of 13.07%, 5.61%, and 8.06% w.r.t NMI, ACC, and F-score, respectively compared with the corresponding second best approach. On Yale Face with thress views, FCMSC displays 14.07%, 17.30%, and 30.65% of relative improvement w.r.t NMI, ACC and F-score over the corresponding econd best approach, respectively.

Dataset Method NMI ACC F-SCORE
Mfeatures 0.6506(0.0076) 0.6969(0.0143) 0.6001(0.0108)
0.6900(0.0029) 0.7077(0.0081) 0.6373(0.0048)
0.6971(0.0028) 0.7601(0.0071) 0.6626(0.0043)
0.7381(0.0009) 0.7951(0.0006) 0.7089(0.0010)
Kernel Addition 0.7730(0.0082) 0.7942(0.0185) 0.7247(0.0129)
Co-reg 0.7273(0.0079) 0.7690(0.0158) 0.6919(0.0122)
RMSC 0.7412(0.0062) 0.7662(0.0178) 0.6867(0.0111)
LMSC 0.8106(0.0186) 0.8212(0.0550) 0.7739(0.0364)
FCMSC 0.8330(0.0014) 0.8943(0.0016) 0.8120(0.0023)
Movies 617 0.2606(0.0020) 0.2579(0.0035) 0.1481(0.0025)
0.2668(0.0017) 0.2604(0.0033) 0.1542(0.0019)
0.2667(0.0059) 0.2747(0.0071) 0.1545(0.0047)
0.2839(0.0075) 0.2824(0.0135) 0.1813(0.0063)
Kernel Addition 0.2917(0.0026) 0.2901(0.0049) 0.1764(0.0033)
Co-reg 0.2454(0.0018) 0.2396(0.0017) 0.1381(0.0016)
RMSC 0.2957(0.0032) 0.2971(0.0040) 0.1810(0.0028)
LMSC 0.2813(0.0098) 0.2747(0.0094) 0.1606(0.0068)
FCMSC 0.3195(0.0085) 0.3108(0.0101) 0.1932(0.0064)
MSRCV1 0.6047(0.0112) 0.6826(0.0171) 0.5724(0.0122)
0.4398(0.0021) 0.5073(0.0077) 0.3978(0.0032)
0.5704(0.0054) 0.6732(0.0091) 0.5368(0.0076)
0.0667(0.0035) 0.1851(0.0056) 0.2363(0.0026)
Kernel Addition 0.6176(0.0087) 0.7102(0.0130) 0.5973(0.0097)
Co-reg 0.6583(0.0106) 0.7674(0.0169) 0.6459(0.0128)
RMSC 0.6696(0.0064) 0.7819(0.0125) 0.6614(0.0093)
LMSC 0.6534(0.0105) 0.8055(0.0128) 0.6517(0.0171)
FCMSC 0.7571(0.0043) 0.8507(0.0023) 0.7420(0.0029)
Olympics 0.7617(0.0046) 0.6288(0.0112) 0.5178(0.0134)
0.5625(0.0038) 0.4610(0.0078) 0.3194(0.0085)
0.8674(0.0038) 0.7830(0.0093) 0.7112(0.0088)
0.1762(0.0133) 0.1669(0.0078) 0.1036(0.0025)
Kernel Addition 0.7245(0.0038) 0.6093(0.0073) 0.5189(0.0071)
Co-reg 0.8308(0.0027) 0.7341(0.0071) 0.6707(0.0079)
RMSC 0.7573(0.0063) 0.6372(0.0108) 0.5687(0.0117)
LMSC 0.8902(0.0065) 0.8043(0.0140) 0.7814(0.0154)
FCMSC 0.8925(0.0056) 0.8279(0.0157) 0.7953(0.0209)
ORL 0.8868(0.0069) 0.7459(0.0121) 0.6805(0.0159)
0.8084(0.0027) 0.6323(0.0061) 0.5236(0.0069)
0.8477(0.0096) 0.7192(0.0200) 0.6114(0.0235))
0.8497(0.0085) 0.7178(0.0190) 0.6119(0.0218)
Kernel Addition 0.8028(0.0033) 0.6349(0.0074) 0.5224(0.0061)
Co-reg 0.8277(0.0040) 0.6653(0.0080) 0.5672(0.0092)
RMSC 0.8885(0.0056) 0.7482(0.0128) 0.6866(0.0139)
LMSC 0.9310(0.0116) 0.8194(0.0171) 0.7583(0.0091)
FCMSC 0.9329(0.0077) 0.8374(0.0307) 0.7885(0.0299)
Yale Face 0.6229(0.0354) 0.5715(0.0497) 0.4319(0.0472)
0.5761(0.0335) 0.5145(0.0460) 0.3653(0.0420)
0.6920(0.0212) 0.6634(0.0256) 0.4964(0.0337)
0.6917(0.0190) 0.6667(0.0236) 0.4941(0.0303)
Kernel Addition 0.5872(0.0320) 0.5352(0.0397) 0.3823(0.0390)
Co-reg 0.6146(0.0084) 0.5638(0.0108) 0.4208(0.0110)
RMSC 0.6590(0.0108) 0.6091(0.0161) 0.4773(0.0133)
LMSC 0.7073(0.0105) 0.6758(0.0116) 0.5138(0.0172)
FCMSC 0.8068(0.0129) 0.7927(0.0264) 0.6713(0.0236)
TABLE III: Comparison results of different methods on six benchmark datasets

Since most exiting approaches blindly combine the consensus and complementary information from multi-view data with possible considerable noise, there are difficult to obtain promising performance for multi-view clustering. For the joint view representation, it may be difficult to obtain promising clustering results by directly running spectral clustering algorithm or low-rank representation method, even may be worse than that of single view, taking MSRCV1, Olympics, and Yale Face datasets for example. Since each view has its own specific properties that may be contrary to other views, it is difficult to explore and utilize the consensus and complementary information of all views by performing some existing single-view clustering approaches on the concatenated features. To analyze the reason in a more intuitive manner, taking ORL and Yale Face databases as examples, Fig. 5 illustrate the visualization of adjacency matrices, which are calculated from each view or concatenated data by Gaussian kernel for spectral clustering. From the comprehensive comparison, Fig. 5 also displays the visualization of coefficient matrices and , which are calculated from concatenated data by the low-rank representation and the proposed FCMSC, respectively. As shown in Fig. 5, adjacency matrices calculated by Gaussian kernel are more likely corrupted by noises. While, the coefficient matrices and are less likely corrupted by noises. Between these two coefficient matrices, is more suitable than for multi-view clustering. By applying spectral clustering to , the proposed FCMSC can obtain promising performance for multi-view clustering.

V-D Parameters Sensitivity

In FCMSC, there are four parameters required to be fine-tuned, i.e. , , , and . To achieve good clustering performance, these parameters require to be set empirically. The key question arising here is whether the performance of our approach is sensitive to these parameters.

To answer this question, experiments are conducted on MSRCV1 datasets to observe the effects on clustering performance with different values of each parameters. Experimental results are reported in metrics of ACC and NMI, with one parameter variant and other parameters fixed. More specifically, we first fix , and to be 0.1, 0.05 and 0.1, respectively, and then tune from 0.01 to 1. Fig. 6 shows the corresponding results. With and , Fig. 7 illustrates the clustering performance of the proposed approach based on different values of and . What’s more, Fig. 8 presents the clustering results of the proposed approach for variant , with , , and . Obviously, when we set , the performance of our approach only has small variations as long as , , and are chosen in a suitable range, i.e., when we set and to be no smaller than 0.5, to be no larger than 0.05.

In summary, with , FCMSC is relatively insensitive to other parameters , and as long as these parameters are chosen from a suitable range. This makes FCMSC easy to apply without much effort for parameter tuning.

Fig. 6: Results of the proposed FCMSC with different values of parameter on MSRCV1 dataset.
Fig. 7: Results of the proposed FCMSC with different values of parameters and on MSRCV1 dataset.
Fig. 8: Results of the proposed FCMSC with different values of parameter on MSRCV1 dataset.

Vi Conclusion

This paper proposes a feature concatenation multi-view subspace clustering approach, termed FCMSC. Different from most of existing approaches, we perform a novel clustering strategy on the concatenated features to explore both the consensus and complementary information among multiple views. By taking both specific-error and cluster-specific error into consideration, the coefficient matrix can be recovered from concatenated features for clustering with promising performance. Extensive experiments on real-word datasets demonstrate the superiority of our approach over some state-of-the-art approaches.

Despite effective the FCSMC is, it is time consuming due to the operation of matrix inversion and SVD decomposition involved in the optimization, especially when the dimension of joint view representation is high. Further work will focus on the improvement of proposed approach for large-scale data, by employing the dimensionality reduction and the binary representation[43] strategies.


This work is supported by the National Natural Science Foundation of China under Grant No. 61573273.


  • [1] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.
  • [2] Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. Cross-domain feature learning in multimedia. IEEE Transactions on Multimedia, 17(1):64–78, 2015.
  • [3] Lei Zhang and David Zhang. Visual understanding via multi-feature shared learning with global consistency. IEEE Transactions on Multimedia, 18(2):247–259, 2016.
  • [4] Aude Oliva and Antonio Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145–175, 2001.
  • [5] David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.
  • [6] Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 24(7):971–987, 2002.
  • [7] Kun Zhan, Chaoxi Niu, Changlu Chen, Feiping Nie, Changqing Zhang, and Yi Yang. Graph structure fusion for multiview clustering. IEEE Transactions on Knowledge and Data Engineering, 2018.
  • [8] Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38:43–54, 2017.
  • [9] Shiliang Sun and Guoqing Chao. Multi-view maximum entropy discrimination. In IJCAI, pages 1706–1712, 2013.
  • [10] Tao Zhou, Changqing Zhang, Chen Gong, Harish Bhaskar, and Jie Yang. Multiview latent space learning with feature redundancy minimization. IEEE Transactions on Cybernetics, 2018.
  • [11] Yue Gao, Yi Zhen, Haojie Li, and Tat-Seng Chua. Filtering of brand-related microblogs using social-smooth multiview embedding. IEEE Trans. Multimedia, 18(10):2115–2126, 2016.
  • [12] Zhi-Hua Zhou. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012.
  • [13] René Vidal. Subspace clustering. IEEE Signal Processing Magazine, 28(2):52–68, 2011.
  • [14] René Vidal and Paolo Favaro. Low rank subspace clustering (lrsc). Pattern Recognition Letters, 43:47–61, 2014.
  • [15] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):171–184, 2013.
  • [16] Ehsan Elhamifar and Rene Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765–2781, 2013.
  • [17] Vishal M Patel, Hien Van Nguyen, and René Vidal. Latent space sparse and low-rank subspace clustering. IEEE Journal of Selected Topics in Signal Processing, 9(4):691–701, 2015.
  • [18] Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. Smooth representation clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3834–3841, 2014.
  • [19] Changqing Zhang, Qinghua Hu, Huazhu Fu, Pengfei Zhu, and Xiaochun Cao. Latent multi-view subspace clustering. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4333–4341. IEEE, 2017.
  • [20] Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, and Xiaochun Cao. Low-rank tensor constrained multiview subspace clustering. In Proceedings of the IEEE international conference on computer vision, pages 1582–1590, 2015.
  • [21] Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. Diversity-induced multi-view subspace clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–594, 2015.
  • [22] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multi-view subspace clustering. In Proceedings of the IEEE international conference on computer vision, pages 4238–4246, 2015.
  • [23] Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. arXiv preprint arXiv:1608.05560, 2016.
  • [24] Abhishek Kumar and Hal Daumé. A co-training approach for multi-view spectral clustering. In

    Proceedings of the 28th International Conference on Machine Learning (ICML-11)

    , pages 393–400, 2011.
  • [25] Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral clustering. In Advances in neural information processing systems, pages 1413–1421, 2011.
  • [26] Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multi-view spectral clustering via low-rank and sparse decomposition. In AAAI, pages 2149–2155, 2014.
  • [27] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In Advances in neural information processing systems, pages 612–620, 2011.
  • [28] Chang Tang, Xinzhong Zhu, Xinwang Liu, Miaomiao Li, Pichao Wang, Changqing Zhang, and Lizhe Wang. Learning joint affinity graph for multi-view subspace clustering. IEEE Transactions on Multimedia, 2018.
  • [29] Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multi-view clustering. arXiv preprint arXiv:1712.06246, 2017.
  • [30] Xing Yi, Yunpeng Xu, and Changshui Zhang. Multi-view em algorithm for finite mixture models. In International Conference on Pattern Recognition and Image Analysis, pages 420–425. Springer, 2005.
  • [31] Grigorios F Tzortzis and Aristidis C Likas. Multiple view clustering using a weighted combination of exemplar-based mixture models.

    IEEE Transactions on neural networks

    , 21(12):1925–1938, 2010.
  • [32] Yu-Meng Xu, Chang-Dong Wang, and Jian-Huang Lai.

    Weighted multi-view clustering with feature selection.

    Pattern Recognition, 53:25–35, 2016.
  • [33] Maria Brbić and Ivica Kopriva. Multi-view low-rank sparse subspace clustering. Pattern Recognition, 73:247–258, 2018.
  • [34] Handong Zhao, Zhengming Ding, and Yun Fu. Multi-view clustering via deep matrix factorization. In AAAI, pages 2921–2927, 2017.
  • [35] Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multi-view clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 252–260. SIAM, 2013.
  • [36] Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788, 1999.
  • [37] Kamalika Chaudhuri, Sham M Kakade, Karen Livescu, and Karthik Sridharan. Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning, pages 129–136. ACM, 2009.
  • [38] Guoqing Chao and Shiliang Sun. Multi-kernel maximum entropy discrimination for multi-view learning. Intelligent Data Analysis, 20(3):481–493, 2016.
  • [39] Tong Zhang, Alexandrin Popescul, and Byron Dom. Linear prediction models with graph regularization for web-page categorization. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 821–826. ACM, 2006.
  • [40] Dongyan Guo, Jian Zhang, Xinwang Liu, Ying Cui, and Chunxia Zhao. Multiple kernel learning based multi-view spectral clustering. In International Conference on Pattern Recognition, 2014.
  • [41] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
  • [42] Richard H Bartels and G. W Stewart. Solution of the matrix equation ax+xb=c [f4] (algorithm 432). Communications of the Acm, 15(9):820–826, 1972.
  • [43] Zheng Zhang, Li Liu, Fumin Shen, Heng Tao Shen, and Ling Shao. Binary multi-view clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99):1–1, 2018.