Multi-graph Fusion for Multi-view Spectral Clustering

09/16/2019 ∙ by Zhao Kang, et al. ∙ 5

A panoply of multi-view clustering algorithms has been developed to deal with prevalent multi-view data. Among them, spectral clustering-based methods have drawn much attention and demonstrated promising results recently. Despite progress, there are still two fundamental questions that stay unanswered to date. First, how to fuse different views into one graph. More often than not, the similarities between samples may be manifested differently by different views. Many existing algorithms either simply take the average of multiple views or just learn a common graph. These simple approaches fail to consider the flexible local manifold structures of all views. Hence, the rich heterogeneous information is not fully exploited. Second, how to learn the explicit cluster structure. Most existing methods don't pay attention to the quality of the graphs and perform graph learning and spectral clustering separately. Those unreliable graphs might lead to suboptimal clustering results. To fill these gaps, in this paper, we propose a novel multi-view spectral clustering model which performs graph fusion and spectral clustering simultaneously. The fusion graph approximates the original graph of each individual view but maintains an explicit cluster structure. Experiments on four widely used data sets confirm the superiority of the proposed method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 21

page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the increasing popularity of sensors and multi-camera surveillance systems, one object is often represented from multiple views chao2019semi ; zhu2018multi ; tang2018consensus ; ding2019multiway . For example, a person can be uniquely identified in terms of face, fingerprint, iris, and signature; an image can be described by different kinds of descriptors: SIFT, HOG, and LBP, where SIFT is robust to image illumination, noise, and rotation, HOG is sensitive to marginal information, while LBP is a powerful texture feature; the same document can be represented in different languages. Different views can capture distinct perspectives of data. Numerous real-world applications have benefited from multi-view data by leveraging the complementary information li2017multi ; chao2017survey ; zhang2018generalized ; liu2018late ; kang2019multiple . Thus, multi-view learning has become an important research field chen2013twkm ; huang2019auto .

As an important ingredient of multi-view learning, multi-view clustering has been widely investigated to identify underlying structures in multi-view data in an unsupervised way huang2018self ; wang2019study . Although each view contains different fractional information, they together admit the same clustering structure. Simply concatenating all features into a single view and then employing a clustering algorithm on this single view data might not obtain better performance than traditional methods which use single view separately zhan2018adaptive ; huang2019auto .

In the past decade, plenty of advanced multi-view clustering algorithms have been proposed and they perform effectively by considering the diversity and complementarity of different views. According to the mechanisms on which those methods are based, we can roughly divide them into five categories: co-training style methods kumar2011cotrain ; kumar2011co ; tao2018reliable ; multi-kernel learning liu2017multiple ; tzortzis2012kernel ; guo2014multiple ; multi-view graph clustering wang2016iterative ; cao2015diversity ; gao2015multi ; zhan2017graph ; wang2017exclusivity ; zhang2019multitask ; multi-view subspace clustering liu2013multi ; guo2013convex ; xu2017re ; liu2018consensus ; multi-task multi-view clustering zhang2015multi ; gu2009learning .

Among these methods, spectral clustering based multi-view algorithms often report satisfying results. kumar2011cotrain

proposed a co-training approach to search for the clustering that agree across the views. In this approach, the eigenvectors obtained from one view are used to update the graph of the other view.

kumar2011co further developed a co-regularized method to look for clustering that are consistent across the views, where the eigenvectors of all views are regularized. Despite their popularity, a common drawback shared by these two methods is that their performance heavily depends on the input graph. It is well-known that small perturbations in the entries of the graph may lead to large perturbations in the eigenvectors, thus leads to inferior clustering accuracy hunter2010performance ; zhao2015automatic ; robust2019kang ; ding2018semi . Therefore, constructing an accurate graph is highly desired.

To this end, a number of graph learning based clustering methods have been proposed recently kang2017twin ; zhang2013graph ; kang2019low

. They seek to learn graph from data dynamically. This approach enjoys several nice properties, such as robustness to noise and outliers, independence of similarity metrics. For example, Nie et al. constructed the graph based on adaptive neighbors

nie2014clustering

, i.e., the probability of one data point being the neighbor of another point is treated as a measure of the similarity between them. Afterwards, many researchers extended this idea to deal with multi-view data.

nie2016parameter reformulated the standard spectral clustering model and put forth a parameter-free multi-view clustering method. This algorithm assumes that all graphs share a common eigenvector. Additionally, this approach takes graph construction and spectral clustering as two separate procedures. As a result, they are not jointly optimized. To solve this problem, nie2017multi further developed a unified framework which performs graph learning and spectral clustering simultaneously. However, in this approach, only a common graph is learned based on adaptive neighbors. Consequently, it fails to preserve the flexible local manifold structures for all views which leads to suboptimal clustering performance wang2016iterative . In addition, one significant limitation of adaptive neighbors-based graph learning is that it can only capture the intrinsic local structure information of the data.

On the other hand, subspace clustering method has the capability to explore global low-dimensional manifold structure encoded by the data correlations embedded in high-dimensional space peng2017deep ; chen2012fgkm ; kang2017kernel ; zhang2016joint ; li2015robust . It’s based on the self-expressiveness property which assumes that each sample can be linearly represented by the other ones. This representation coefficient matrix behaves like the similarity graph matrix zhang2017latent ; xia2014robust ; kang2019Clustering ; zhang2019robust . Two widely used assumptions about are low-rank liu2013robust and sparse elhamifar2013sparse . After obtaining the graph, the final clustering result is generated by the spectral clustering algorithm ng2002spectral ; chen2018dnc . Based on this strategy, varieties of multi-view clustering methods have been proposed.

Gao et al. gao2015multi proposed multi-view subspace clustering algorithm. It learns a graph for each view and enforces a common cluster indicator matrix for all graphs. Thus, the clustering result is consistent for all views. However, this assumption is too strong since the common cluster indicator matrix must negotiate with all graphs. Consequently, the resulted solution might not be optimal. cao2015diversity focused on boosting the multi-view clustering by exploring the complementarity of multi-view representations. In specific, they utilize the Hilbert Schmidt Independence Criterion (HSIC) to capture the diversity information. As a result, multiple graphs are built and their average is used as input for spectral clustering. This simple post-processing strategy treats all views equally, which might result in inferior performance. Wang et al. wang2016iterative developed a low-rank based multi-view spectral clustering method. Though they added a term to characterize the agreement among the graph, they still used the average of graphs for spectral clustering. This two-step approach might cause unsatisfied results since the averaged graph might not be optimal for subsequent clustering task.

Figure 1: Illustration of our GFSC approach. GFSC integrates graph learning, graph fusion, and spectral clustering into a unified framework. The clustering result is further utilized to guide the graph construction and fusion, which in turn contributes to a better clustering.

Despite these progresses, multi-view spectral clustering still arguably faces the following fundamental limitations. First, how to effectively fuse the graphs from all views. Integrating graphs is not trivial since exploration of complementary information of multiple views is the core of multi-view learning gao2015multi . Simply taking the average of them fails to consider the discriminative property of views. In many situations, the similarities between samples may be manifested differently by different views. For instance, two video clips that present the same content but in different languages, their audio content will be different. Second, how to consider the explicit cluster structure. It is widely accepted that the clustering results highly depend on the quality of the affinity graph. Many existing methods implement graph construction and spectral clustering separately. Thus, the learned graph might not be ideal for subsequent clustering.

To solve the above challenging problems, we propose a novel multi-view spectral clustering method which performs graph fusion and spectral clustering simultaneously. Fig. 1 shows the idea of our approach. The fusion graph approximates the original graph of each individual view but maintains an explicit cluster structure. Experiments on four widely used data sets confirm the superiority of the proposed method. The contributions of this paper are summarized in the following two aspects:

  • A novel graph fusion mechanism is proposed to integrate the multi-view information. It is based on two basic principles: 1) the graph of each view is a perturbation of the consensus graph, and 2) graph that is close to the consensus graph should be assigned a large weight. The graphs are weighted dynamically during the fusion process so that the adversary effect of noise graphs is reduced effectively.

  • The cluster structure of the consensus graph is further considered. As a result, an optimal graph, which has exactly connected components if there are clusters, can be readily achieved for clustering. The experimental results confirm its superiority compared to state-of-the-art methods.

Notation.

In this paper, matrices are represented by capital letters and vectors are denoted by lower case letters. For an arbitrary matrix

, its Frobenius norm is . The -norm of vector is represented by , where means transpose. denotes the trace of . means that all elements of are nonnegative.

is the identity matrix with a proper size.

2 Multi-view Spectral Clustering Revisited

Let denote the multi-view data with views. is the data matrix of view , is the dimension of features in the -th view, and is the number of samples. Given the adjacent matrix of each view, the graph Laplacian matrix , where diagonal matrix is the degree matrix with . Assuming that the cluster indicator matrix is the same across all the views, we can formulate the multi-view spectral clustering problem as gao2015multi

(1)

where each graph contributes equally to the final result . In above equation, we ignore the details about the graph construction. Instead of enforcing multiple graphs share the same , several other works simply take the average of graphs and then implement the spectral clustering separately cao2015diversity ; wang2016iterative . Consequently, the complementary information is not fully exploited since each view is not distinguished from the others. Furthermore, the graphs from different views might differ a lot. It is unrealistic for them to achieve an agreement . Thus, these approaches will lead to inferior clustering result. Some researchers try a linear combination of those graphs li2015large . However, the complementary information from multi-view data is not necessarily linearly related. In addition, this linear combination is also sensitive to the weights assigned to each graph. To fill this gap, in this paper, we propose a strategy to integrate the graphs.

Even if we can achieve a high-quality graph based on our graph fusion principle, we are still unsure whether the graph is suitable for the subsequent clustering task at hand. Ideally, the optimal graph should have an exactly number of components so that vertices in each connected component of the graph are grouped into the same one cluster. Hence, we go further and incorporate the cluster structure of the consensus graph.

3 Proposed Multi-graph Fusion for Multi-view Spectral Clustering

3.1 Self-expressiveness based Graph Learning

Self-expressiveness property states that each data sample can be expressed as a linear combination of other samples. This combination coefficient indicates the similarities between samples vidal2011subspace ; liu2013robust . This similarity graph can be obtained by solving

(2)

where is a trade-off parameter. It can be easily extended to multi-view data, i.e.,

(3)

where the same trade-off parameter is often adopted for simplicity. Different graphs capture different aspects of the multi-view data. Then the average of these graphs is often used to achieve the final clustering result cao2015diversity ; wang2016iterative . That is to say, the consensus graph ,

(4)

is taken as the input for spectral clustering algorithm ng2002spectral . It is obvious that this approach fails to distinguish the different contributions of different views. More often than not, some views containing irrelevant or noisy representation might severely damage the graphs and lead to degraded performance. To recap the powerfulness of the complementarity nature of multi-view data, we propose a way to aggregate these basic graphs to form a consensus graph .

3.2 Graph Fusion

Our proposed graph fusion method is based on two intuitive assumptions: 1) the graph of each view is a perturbation of the consensus graph , and 2) the graph that is close to the consensus graph should be assigned a large weight. The consensus graph is supposed to capture the ground-truth sample similarity hidden in the multi-view data. To avoid the influence of low quality (noisy) views, we try to assign different weights to different graphs. As a result, we can reach a better clustering performance based on than that of .

Based on above principles, our graph fusion mechanism can be formulated as

(5)

where the weight characterizes the importance of view . We can simply adopt the inverse distance weighting scheme nie2016parameter ; nie2017self , i.e.,

(6)

Since is unknown beforehand, we can calculate it approximately based on an iterative approach. Then we combine Eq. (5) and (3). It yields

(7)

Through solving this problem, we can obtain both the graph for each view and the consensus graph adaptively. Additionally, the graphs are weighted dynamically during the fusion process so that the adversary effect of noise graphs is reduced effectively. Although we can directly implement spectral clustering based , we move forward and consider the cluster structure of it since the current graph might not be optimal for the subsequent clustering task.

3.3 Structured Graph Learning

Ideally, the solution of problem (7) should have exact connected components, i.e., the data points are already clustered into clusters. However, the current solution can hardly satisfy to such a condition. This can be fulfilled based on the following theorem mohar1991laplacian :

Theorem 1.

The number of connected components of the graph

is equal to the multiplicity of zero eigenvalues of its Laplacian matrix

.

Since is a positive semi-definite matrix, its eigenvalues . Theorem 1 means that if , then our expectation can be approximately satisfied. Hence, we can minimize instead to satisfy the requirement. According to Ky Fan’s theorem fan1949theorem , we can obtain an objective function

(8)

The right part of this equation is nothing but the objective function of spectral clustering. Hence, Eq. (8) establishes the connection between our requirement for the graph structure and spectral clustering.

Minimizing Eq. (8), we can approximately guarantee the structure of graph . Therefore, we can combine Eqs. (8) and (7) to a single objective function, which fulfills the tasks of graph learning, graph fusion, and spectral clustering. Consequently, our proposed multi-Graph Fusion for multi-view Spectral Clustering (GFSC) can be formulated as

(9)

where , , and are regularization parameters. The objective function (9) enjoys the following properties:

  • The last term in Eq. (9) functions as a regularizer on graph . We tune the structure of adaptively so that we achieve the optimal condition. At the same time, it seamlessly integrates the graph construction and spectral clustering processes.

  • For this multi-view spectral clustering method, the graph is automatically learned from the data rather than pre-defined as in most existing spectral clustering methods. This results in a reliable and robust graph.

  • The graph fusion term seeks to find the underlying relationships between samples. Rather than treating each view equally, weight can well distinguish the different contributions of different views. Consequently, the complementary information of heterogeneous data is more effectively explored.

  • In this joint framework, the high-quality clustering result is utilized to guide the graph construction, which is then used to obtain a new clustering. This mutually improving approach can boost the final clustering result.

4 Optimization of Problem (9)

The variables in Eq. (9) are coupled to each other. We can solve them utilizing an alternating iterative strategy.

Solving when and are fixed. The problem (9) becomes

(10)

We can observe that Eq. (10) is independent for each view. Thus, we can update separately for each view. Taking the derivative of Eq. (10) w.r.t. , we have

Setting above formula to zero, we obtain

(11)

Solving when and are fixed. Remembering that is a function of , thus we obtain

(12)

To solve this subproblem, we use equality

and define with the -th entry . Then problem (12) can be solved column-wisely

(13)

Its derivative w.r.t. is , which should be zero. It yields

(14)

Solving when and are fixed. It yields

(15)

The optimal solution of is obtained by the eigenvectors of corresponding to the smallest eigenvalues.

The details of solving the problem in Eq. (9) is summarized in Algorithm 1. We stop our algorithm if the maximum iteration number 200 is reached or the relative change of is less than . The complete implementation package is available 111https://github.com/sckangz/GFSC.

4.1 Computational Analysis

The main computation demand of Algorithm 1 is due to the update of and . Specifically, updating costs about due to the matrix inversion and multiplication. The complexity of updating is also due to the employment of SVD operation. To make our algorithm more efficient, several off-the-shell acceleration algorithms could be utilized, e.g., skinny SVD zhang2014fast , sampling-based methods zhang2016sampling ; xu2018improved ; jia2017nystrom . In our experiments, we don’t apply these acceleration techniques.

Input: Data matrices: , parameters , , .
Output: , , .
Initialize: Random matrices and , .
REPEAT
1:  Update according to Eq. (11) for each view.
2:  For each element .
3:  Update according to Eq. (14).
4:  Update by solving the problem (15).
5:  Update according to (6).
UNTIL stopping criterion is met.
Algorithm 1 The algorithm of GFSC
#View BBC Reuters Digits Caltech20
1 Segment1 (4659) English (2000) Profile correlations (216) Gabor (48)
2 Segment2 (4633) French (2000) Fourier coefficients (76)

Wavelet moments (40)

3 Segment3 (4665) German (2000) Karhunen coefficients (64) CENTRIST (254)
4 Segment4 (4684) Spanish (2000) Morphological (6) HOG (1984)
5 Italian (2000) Pixel averages (240) GIST (512)
6 Zernike moments (47) LBP (928)
#Sample 145 1200 2000 2386
#Class 2 6 10 20
Table 1: Information of the data sets (#Feature).

5 Experiments

5.1 Data Set Descriptions

We employ four widely used multi-view data sets for performance evaluation, namely BBC, Reuters222http://archive.ics.uci.edu/ml/datasets.html, Digits, Caltech20333http://www.vision.caltech.edu/Image Datasets/Caltech101/. Among them, BBC and Reuters are text data sets; Digits and Caltech20 are image data. In these cases, represents the similarity between different documents or images. Table 1 shows the concrete information of the data sets. According to cai2013multi , we normalize the data sets so that all the values of each view are in the range [-1, 1].

5.2 Evaluation Metrics

We evaluate the performance using three popular metrics: accuracy (Acc), normalized mutual information (NMI), purity peng2018integrate .

  • Accuracy (Acc). Accuracy is applied to find the one-to-one relationship between clusters and classes and evaluates how many data points are contained in each cluster that are from the corresponding class. It is the summation of the whole matching degree between all pair class-clusters.

    (16)

    where represents the -th cluster, denotes the -th class, and denotes the number of points that are assigned to cluster but belongs to class . Accuracy is defined as the maximum sum of over all pairs of clusters and classes.

  • Normalized Mutual Information (NMI). Let and

    be two random variables,

    and are their corresponding entropies. Then the NMI is defined as

    (17)

    where denotes the mutual information between and . Higher value indicates better performance.

  • Purity.

    Purity is defined as the percent of the total number of points that are classified correctly. Then,

    (18)

    where denotes a cluster and represents the classification that has the maximum count for cluster . .

5.3 Comparison Algorithms

We compare with both single view and multi-view clustering algorithms.

  • Spectral clustering (SC) ng2002spectral : We include the classic SC method as baseline method. We apply SC on each view of features. SC(1) means the implementation of SC on the 1st view. SC(Ave) means that the result is based on the average graph of views. Note that all graphs are learned from data according to Eq. (2).

  • K-means clustering (KM): We conduct KM on the concatenated features. That is to say, we assume that all the views are of the same importance to the clustering task.

  • Co-training multi-view spectral clustering (Co-trainkumar2011cotrain : It utilizes the eigenvector from one view to guide the graph construction in another view. Consequently, the clusterings of multiple views tend towards consensus.

  • Co-regularized multi-view spectral clustering (Co-regkumar2011co : This method employs co-regularization technique to make the clusterings in different views agree with each other.

  • Multi-view kernel K-means (MVKKMtzortzis2012kernel : This method transforms each view into a kernel matrix and learns a weighted combination of kernels. At the same time, kernel k-means algorithm is applied to obtain the final result.

  • Robust multi-view K-means clustering (RMKMCcai2013multi : It adopts -norm in traditional k-means algorithm to deal with data outliers. In addition, a weight factor is introduced for each view.

  • Multi-view clustering with self-paced learning (MSPLxu2015multi : This method applies the self-paced learning strategy to multi-view clustering. Hence the multi-view model is learned from easy to complex examples/views which are determined by a probabilistic smoother weighting scheme.

  • Auto-weighted multiple graph learning (AMGLnie2016parameter : It extends the spectral clustering method to multi-view situation. Different from our approach, the graphs are learned by adaptive neighbors approach.

  • Multi-view subspace clustering (MVSC) gao2015multi : Multiple graphs are learned and they share the same cluster indicator matrix. Unlike our approach, there is no graph fusion process.

  • Diversity-induced multi-view subspace clustering (DiMSC) cao2015diversity : Multiple graphs are learned and their average is inputted to the spectral clustering algorithm. Moreover, the Hilbert Schmidt Independence Criterion (HSIC) is incorporated as a diversity regularizer to explore the complementarity of multiple views.

  • Iterative based multi-view spectral clustering (IMVSC) wang2016iterative : This method learns multiple graphs and each one is assumed to be low-rank and sparse. In addition, Laplacian regularization and views agreement are imposed on the graphs. Finally, the average of learned graphs is used for spectral clustering.

  • Our proposed GFSC. Both graph fusion and graph structure are considered in our approach. After obtaining , we implement K-means on it to obtain the final discrete cluster labels. Furthermore, to see the effect of graph structure, we also compare with the approach based on problem (7) referred as GF. Unlike GFSC in problem (9), we implement the spectral clustering method separately after obtaining in GF.

5.4 Results

For those methods with parameters, we tune them to achieve the best performance. For example, the range for our method is displayed in Figure LABEL:sensitivity

. We repeat each algorithm 10 times and report their mean and standard deviation (std) values in Tables

2-5. The best results are marked in boldface. According to these results, we can draw the following conclusions.

  • Comparing the SC performance on different views, we can see that different views indeed produce different results. This confirms the heterogeneity of multiple views. Therefore, it is essential to differentiate views when we build a multi-view learning model, just as we do in this paper.

  • Comparing SC(Ave) with each individual view results, we can see that naively taking the average of graphs might deteriorate the performance. In order to obtain reliable results, it is eager to design a graph fusion mechanism.

  • With respect to SCs and SC(Ave), our proposed GFSC method often shows better performance. This is largely due to the fact that a more accurate graph is learned in our approach. Remember that we employ both graph fusion and weighting strategy in our model.

  • GFSC always performs better than GF. This fully demonstrates the importance of considering the graph structure. Additionally, GF often outperforms SC(Ave). This shows the advantage of graph fusion.

  • Our GFSC method consistently outperforms k-means based multi-view methods, i.e., KM, MVKKM, RMKMC, MSPL. This validates the superiority of spectral clustering method. It is well-known that spectral clustering often performs better than k-means technique.

  • In addition, GFSC consistently performs better than AMGL. AMGL is based on adaptive neighbors which captures the local structure of data. By contrast, our graph learning is based on self-expressiveness which is supposed to grasp the global structure of data.

  • Our method significantly outperforms classic multi-view methods Co-train and Co-reg. Co-train and Co-reg methods construct graphs manually and they mainly regularize the multiple partitions.

  • Compared to state-of-the-art multi-view subspace clustering algorithms, i.e., DiMSC, MVSC, and IMVSC, our method beats them in most cases in terms of Acc, NMI, and Purity. Though they build the graphs in a similar way as ours, they don’t use any graph fusion strategy. This fully demonstrates the efficacy of our graph fusion.

In summary, these observations validate the efficacy of our graph fusion and graph structure learning strategies.

Method Acc Purity NMI
SC(1) 91.72(0.00) 99.31(0.00) 0.20(0.00)
SC(2) 93.79(0.00) 98.62(0.00) 13.71(0.00)
SC(3) 91.17(1.74) 98.62(2.18) 0.18(0.05)
SC(4) 91.72(0.00) 99.31(0.00) 0.20(0.00)
SC(Ave) 91.72(0.00) 99.31(0.00) 0.20(0.00)
KM 91.59(0.31) 90.24(0.24) 14.10(1.30)

Co-train
91.27(0.00) 87.57(1.20) 3.50(0.00)

Co-reg
90.90(0.76) 90.78(1.40) 6.8(0.30)

MVKKM
84.00(6.13) 89.01(2.35) 8.3(0.64)

RMKMC
91.31(0.62) 89.67(1.80) 8.00(0.74)

MSPL
80.41(13.24) 90.41(0.00) 10.11(9.48)

AMGL
89.66(0.00) 91.00(0.67) 11.2(0.00)
DiMSC 93.79(0.00) 94.62(0.00) 13.71(0.00)
MVSC 91.03(0.00) 95.62(0.00) 0.41(0.00)
IMVSC 87.59(0.00) 91.03(0.67) 7.90(0.00)
GF 91.72(0.00) 99.31(0.00) 0.20(0.00)
GFSC 93.85(8.22) 99.42(7.29) 15.13(8.45)
Table 2: Performance comparison on BBC (%)
Method Acc Purity NMI
SC(1) 42.98(3.82) 60.09(4.49) 23.48(2.74)
SC(2) 42.67(2.22) 65.79(6.09) 25.06(2.07)
SC(3) 40.76(3.84) 59.29(5.63) 21.53(2.60)
SC(4) 43.43(2.43) 65.33(6.72) 25.04(1.05)
SC(5) 40.98(3.45) 60.39(6.02) 21.95(2.51)
SC(Ave) 44.44(4.01) 60.35(5.52) 25.19(2.48)
KM 24.57(4.52) 25.48(4.37) 11.78(5.01)

Co-train
17.00(0.10) 17.15(0.07) 9.40(0.11)

Co-reg
20.62(1.24) 20.95(1.32) 2.33(0.34)

MVKKM
20.48(3.82) 20.65(3.83) 5.77(3.66)

RMKMC
22.42(6.54) 22.55(6.57) 7.21(7.29)

MSPL
24.87(5.98) 28.12(4.97) 11.50(4.28)

AMGL
18.35(0.15) 20.08(0.54) 6.38(1.00)
DiMSC 39.60(1.32) 46.28(1.74) 18.17(0.64)
MVSC 25.08(0.39) 80.11(5.50) 6.60(0.68)
IMVSC 30.23(0.40) 35.73(1.16) 9.26(0.22)
GF 44.28(2.60) 58.36(3.23) 25.42(1.63)
GFSC 44.92(2.68) 59.40(2.50) 25.73(2.52)
Table 3: Clustering performance on Reuters (%)
Method Acc Purity NMI
SC(1) 62.54(4.56) 70.94(3.77) 62.65(2.39)
SC(2) 59.30(4.08) 64.21(1.24) 57.35(1.23)
SC(3) 53.01(5.57) 75.5(2.12) 55.55(3.78)
SC(4) 23.17(4.22) 89.61(2.58) 23.83(5.18)
SC(5) 30.61(4.43) 81.13(2.85) 29.39(5.32)
SC(6) 55.94(2.65) 57.77(1.53) 48.16(0.99)
SC(Ave) 77.40(6.63) 86.22(2.45) 79.28(2.85)
KM 54.46(5.60) 58.64(2.92) 58.25(0.85)

Co-train
71.42(4.21) 74.86(2.62) 71.06(1.07)

Co-reg
83.38(7.35) 85.17(4.98) 77.97(2.92)

MVKKM
58.81(3.50) 62.40(3.40) 62.91(2.60)

RMKMC
63.04(3.36) 65.74(2.16) 66.57(1.18)

MSPL
68.00(1.12) 68.99(1.17) 70.42(1.95)

AMGL
73.61(10.29) 76.48(8.54) 81.86(4.53)
DiMSC 42.72(1.94) 45.65(0.97) 37.89(0.87)
MVSC 79.60(2.54) 87.19(1.48) 73.89(1.93)
IMVSC 71.03(0.65) 73.95(4.24) 67.20(2.88)
GF 87.76(5.32) 89.44(2.21) 83.28(2.47)
GFSC 89.45(5.10) 91.38(1.03) 85.37(1.96)
Table 4: Performance comparison on Digits (%)
Method Acc Purity NMI
SC(1) 33.82(0.00) 99.20(0.00) 12.89(0.00)
SC(2) 34.18(2.54) 97.91(3.76) 2.34(3.40)
SC(3) 49.80(5.61) 85.28(3.48) 19.71(4.50)
SC(4) 53.13(4.77) 66.05(4.81) 61.03(2.13)
SC(5) 33.65(0.03) 99.20(0.01) 1.14(0.00)
SC(6) 57.36(1.02) 80.72(4.33) 31.22(1.37)
SC(Ave) 65.19(1.17) 86.97(0.58) 45.28(6.19)
KM 31.40(1.30) 60.06(0.38) 37.05(0.41)

Co-train
38.94(2.10) 69.77(1.42) 50.90(1.12)

Co-reg
34.38(0.79) 65.59(1.03) 46.42(0.96)

MVKKM
44.87(2.49) 72.84(0.72) 54.06(1.23)

RMKMC
33.35(1.47) 64.22(0.89) 42.44(0.67)

MSPL
33.49(0.00) 34.24(0.00) 35.80(0.00)

AMGL
52.28(2.91) 67.60(2.31) 56.61(1.93)
DiMSC 33.89(1.45) 37.78(1.35) 39.33(1.16)
MVSC 44.96(2.06) 50.87(2.35) 45.36(0.88)
IMVSC 42.07(1.95) 46.19(1.81) 51.18(0.90)
GF 66.95(1.90) 79.50(4.28) 56.19(3.07)
GFSC 70.24(2.94) 81.49(1.88) 63.09(2.49)
Table 5: Performance comparison on Caltech20 (%)
(a)
(b)
Figure 2: Acc w.r.t. , , and on BBC data.
(a)
(b)
Figure 3: Acc w.r.t. , , and on Reuters data.
(a)
(b)
Figure 4: Acc w.r.t. , , and on Digits data.
(a)
(b)
Figure 5: Acc w.r.t. , , and on Caltech20 data.

5.5 Parameter Analysis

In our proposed model, there are three parameters , , and that need to be set properly. We choose their values by grid searching. Figures 2-5 show the range for each dataset and the sensitivity of the accuracy with regard to the parameters. As can be seen, the optimal parameters are , ,, for BBC, Reuters, Digits, Caltech20, respectively. Overall, our method performs stably to some extent w.r.t. a wide range of parameter values.

6 Conclusion

In this paper, we proposed a novel multi-view spectral clustering method. Unlike many existing methods, which often use averaged graph to perform spectral clustering, we propose a way to fuse graphs to achieve a consensus graph. A parameter-free weighting scheme is introduced to distinguish the contributions of different graphs. Moreover, the cluster structure of the consensus graph is also considered in the proposed method. Consequently, the proposed approach integrates graph learning, fusion, and spectral clustering into a unified framework. These three subtasks are mutually boosted based on an alternating iterative optimization strategy. Experiments on benchmark data sets verify the effectiveness of the proposed methods. The results show that both the consensus graph and the graph structure help improve the clustering quality.

7 Acknowledgement

This paper was in part supported by Grants from the Natural Science Foundation of China (Nos. 61806045, 61572111, and 61772115), two Fundamental Research Fund for the Central Universities of China (Nos. ZYGX2017KYQD177 and A03017023701012), and a 985 Project of UESTC (No. A1098531023601041).

8 References

References

  • (1) G. Chao, S. Sun, Semi-supervised multi-view maximum entropy discrimination with expectation laplacian regularization, Information Fusion 45 (2019) 296–306.
  • (2)

    P. Zhu, Q. Hu, Q. Hu, C. Zhang, Z. Feng, Multi-view label embedding, Pattern Recognition 84 (2018) 126–135.

  • (3)

    C. Tang, J. Chen, X. Liu, M. Li, P. Wang, M. Wang, P. Lu, Consensus learning guided multi-view unsupervised feature selection, Knowledge-Based Systems 160 (2018) 49–60.

  • (4) S. Ding, L. Cong, Q. Hu, H. Jia, Z. Shi, A multiway p-spectral clustering algorithm, Knowledge-Based Systems 164 (2019) 371–377.
  • (5) S. Li, H. Liu, Z. Tao, Y. Fu, Multi-view graph learning with adaptive label propagation, in: Big Data (Big Data), 2017 IEEE International Conference on, IEEE, 2017, pp. 110–115.
  • (6) G. Chao, S. Sun, J. Bi, A survey on multi-view clustering, arXiv preprint arXiv:1712.06246.
  • (7) C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao, D. Xu, Generalized latent multi-view subspace clustering, IEEE transactions on pattern analysis and machine intelligence.
  • (8) X. Liu, X. Zhu, M. Li, L. Wang, C. Tang, J. Yin, D. Shen, H. Wang, W. Gao, Late fusion incomplete multi-view clustering, IEEE transactions on pattern analysis and machine intelligence.
  • (9) Z. Kang, Z. Guo, S. Huang, S. Wang, W. Chen, Y. Su, Z. Xu, Multiple partitions aligned clustering, in: IJCAI, 2019, pp. 2701–2707.
  • (10) X. Chen, X. Xu, Y. Ye, J. Z. Huang, TW-k-means: Automated Two-level Variable Weighting Clustering Algorithm for Multi-view Data, IEEE Transactions on Knowledge and Data Engineering 25 (4) (2013) 932–944.
  • (11) S. Huang, Z. Kang, I. W. Tsang, Z. Xu, Auto-weighted multi-view clustering via kernelized graph learning, Pattern Recognition 88 (2019) 174–184.
  • (12) S. Huang, Z. Kang, Z. Xu, Self-weighted multi-view clustering with soft capped norm, Knowledge-Based Systems.
  • (13) H. Wang, Y. Yang, B. Liu, H. Fujita, A study of graph-based system for multi-view clustering, Knowledge-Based Systems 163 (2019) 1009–1019.
  • (14) K. Zhan, J. Shi, J. Wang, H. Wang, Y. Xie, Adaptive structure concept factorization for multiview clustering, Neural computation 30 (4) (2018) 1080–1103.
  • (15)

    A. Kumar, H. Daumé, A co-training approach for multi-view spectral clustering, in: Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 393–400.

  • (16) A. Kumar, P. Rai, H. Daume, Co-regularized multi-view spectral clustering, in: Advances in neural information processing systems, 2011, pp. 1413–1421.
  • (17) H. Tao, C. Hou, X. Liu, D. Yi, J. Zhu, Reliable multi-view clustering., in: AAAI, 2018.
  • (18) X. Liu, M. Li, L. Wang, Y. Dou, J. Yin, E. Zhu, Multiple kernel k-means with incomplete kernels., in: AAAI, 2017, pp. 2259–2265.
  • (19) G. Tzortzis, A. Likas, Kernel-based weighted multi-view clustering, in: Data Mining (ICDM), 2012 IEEE 12th International Conference on, IEEE, 2012, pp. 675–684.
  • (20) D. Guo, J. Zhang, X. Liu, Y. Cui, C. Zhao, Multiple kernel learning based multi-view spectral clustering, in: Pattern recognition (ICPR), 2014 22nd international conference on, IEEE, 2014, pp. 3774–3779.
  • (21) Y. Wang, W. Zhang, L. Wu, X. Lin, M. Fang, S. Pan, Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering, arXiv preprint arXiv:1608.05560.
  • (22)

    X. Cao, C. Zhang, H. Fu, S. Liu, H. Zhang, Diversity-induced multi-view subspace clustering, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 586–594.

  • (23) H. Gao, F. Nie, X. Li, H. Huang, Multi-view subspace clustering, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 4238–4246.
  • (24) K. Zhan, C. Zhang, J. Guan, J. Wang, Graph learning for multiview clustering, IEEE transactions on cybernetics (99) (2017) 1–9.
  • (25) X. Wang, X. Guo, Z. Lei, C. Zhang, S. Z. Li, Exclusivity-consistency regularized multi-view subspace clustering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 923–931.
  • (26) Y. Zhang, Y. Yang, T. Li, H. Fujita, A multitask multiview clustering algorithm in heterogeneous situations based on lle and le, Knowledge-Based Systems 163 (2019) 776–786.
  • (27) J. Liu, C. Wang, J. Gao, J. Han, Multi-view clustering via joint nonnegative matrix factorization, in: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, 2013, pp. 252–260.
  • (28) Y. Guo, Convex subspace representation learning from multi-view data., in: AAAI, Vol. 1, 2013, p. 2.
  • (29) J. Xu, J. Han, F. Nie, X. Li, Re-weighted discriminatively embedded -means for multi-view clustering, IEEE Transactions on Image Processing 26 (6) (2017) 3016–3027.
  • (30) H. Liu, Y. Fu, Consensus guided multi-view clustering, ACM Transactions on Knowledge Discovery from Data (TKDD) 12 (4) (2018) 42.
  • (31) X. Zhang, X. Zhang, H. Liu, Multi-task multi-view clustering for non-negative data., in: IJCAI, 2015, pp. 4055–4061.
  • (32) Q. Gu, J. Zhou, Learning the shared subspace for multi-task clustering and transductive transfer classification, in: Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, IEEE, 2009, pp. 159–168.
  • (33) B. Hunter, T. Strohmer, Performance analysis of spectral clustering on compressed, incomplete and inaccurate measurements, arXiv preprint arXiv:1011.0997.
  • (34)

    M. Zhao, T. W. Chow, Z. Zhang, B. Li, Automatic image annotation via compact graph based semi-supervised learning, Knowledge-Based Systems 76 (2015) 148–165.

  • (35) Z. Kang, H. Pan, S. C. H. Hoi, Z. Xu, Robust graph learning from noisy data, IEEE Transactions on Cyberneticsdoi:10.1109/TCYB.2018.2887094.
  • (36) S. Ding, H. Jia, M. Du, Y. Xue, A semi-supervised approximate spectral clustering algorithm based on hmrf model, Information Sciences 429 (2018) 215–228.
  • (37)

    Z. Kang, C. Peng, Q. Cheng, Twin learning for similarity and clustering: A unified kernel approach, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17). AAAI Press, 2017.

  • (38) Z. Zhang, M. Zhao, T. W. Chow, Graph based constrained semi-supervised learning framework via label propagation over adaptive neighborhood, IEEE Transactions on Knowledge and Data Engineering 27 (9) (2013) 2362–2376.
  • (39) Z. Kang, L. Wen, W. Chen, Z. Xu, Low-rank kernel learning for graph-based clustering, Knowledge-Based Systems 163 (2019) 510–517.
  • (40) F. Nie, X. Wang, H. Huang, Clustering and projected clustering with adaptive neighbors, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 977–986.
  • (41) F. Nie, J. Li, X. Li, et al., Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification., in: IJCAI, 2016, pp. 1881–1887.
  • (42) F. Nie, G. Cai, X. Li, Multi-view clustering and semi-supervised classification with adaptive neighbours., in: AAAI, 2017, pp. 2408–2414.
  • (43) X. Peng, J. Feng, S. Xiao, J. Lu, Z. Yi, S. Yan, Deep sparse subspace clustering, arXiv preprint arXiv:1709.08374.
  • (44)

    X. Chen, Y. Ye, X. Xu, J. Z. Huang, A feature group weighting method for subspace clustering of high-dimensional data, Pattern Recognition 45 (1) (2012) 434–446.

  • (45) Z. Kang, C. Peng, Q. Cheng, Kernel-driven similarity learning, Neurocomputing 267 (2017) 210–219.
  • (46) Z. Zhang, F. Li, M. Zhao, L. Zhang, S. Yan, Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification, IEEE Transactions on Image Processing 25 (6) (2016) 2429–2443.
  • (47) Z. Li, J. Liu, J. Tang, H. Lu, Robust structured subspace learning for data representation, IEEE transactions on pattern analysis and machine intelligence 37 (10) (2015) 2085–2098.
  • (48) C. Zhang, Q. Hu, H. Fu, P. Zhu, X. Cao, Latent multi-view subspace clustering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4279–4287.
  • (49) R. Xia, Y. Pan, L. Du, J. Yin, Robust multi-view spectral clustering via low-rank and sparse decomposition., in: AAAI, 2014, pp. 2149–2155.
  • (50) Z. Kang, H. Xu, B. Wang, H. Zhu, Z. Xu, Clustering with similarity preserving, Neurocomputingdoi:10.1016/j.neucom.2019.07.086.
  • (51) Z. Zhang, J. Ren, S. Li, R. Hong, Z. Zha, M. Wang, Robust subspace discovery by block-diagonal adaptive locality-constrained representation, arXiv preprint arXiv:1908.01266.
  • (52) G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 171–184.
  • (53) E. Elhamifar, R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications, IEEE transactions on pattern analysis and machine intelligence 35 (11) (2013) 2765–2781.
  • (54)

    A. Y. Ng, M. I. Jordan, Y. Weiss, et al., On spectral clustering: Analysis and an algorithm, Advances in neural information processing systems 2 (2002) 849–856.

  • (55) X. Chen, W. Hong, F. Nie, D. He, M. Yang, J. Z. Huang, Directly minimizing normalized cut for large scale data, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-18, 2018, pp. 1206–1215.
  • (56) Y. Li, F. Nie, H. Huang, J. Huang, Large-scale multi-view spectral clustering via bipartite graph., in: AAAI, 2015, pp. 2750–2756.
  • (57) R. Vidal, Subspace clustering, IEEE Signal Processing Magazine 28 (2) (2011) 52–68.
  • (58) F. Nie, J. Li, X. Li, Self-weighted multiview clustering with multiple graphs, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017, pp. 2564–2570.
  • (59) B. Mohar, Y. Alavi, G. Chartrand, O. Oellermann, The laplacian spectrum of graphs, Graph theory, combinatorics, and applications 2 (871-898) (1991) 12.
  • (60)

    K. Fan, On a theorem of weyl concerning eigenvalues of linear transformations, Proceedings of the National Academy of Sciences 35 (11) (1949) 652–655.

  • (61) X. Zhang, F. Sun, G. Liu, Y. Ma, Fast low-rank subspace segmentation, IEEE Transactions on Knowledge and Data Engineering 26 (5) (2014) 1293–1297.
  • (62) X. Zhang, L. Zong, Q. You, X. Yong, Sampling for nyström extension-based spectral clustering: Incremental perspective and novel analysis, ACM Transactions on Knowledge Discovery from Data (TKDD) 11 (1) (2016) 7.
  • (63) X. Xu, S. Ding, Z. Shi, An improved density peaks clustering algorithm with fast finding cluster centers, Knowledge-Based Systems 158 (2018) 65–74.
  • (64) H. Jia, S. Ding, M. Du, A nyström spectral clustering algorithm based on probability incremental sampling, Soft Computing 21 (19) (2017) 5815–5827.
  • (65) X. Cai, F. Nie, H. Huang, Multi-view k-means clustering on big data., in: IJCAI, 2013, pp. 2598–2604.
  • (66) C. Peng, Z. Kang, S. Cai, Q. Cheng, Integrate and conquer: Double-sided two-dimensional k-means via integrating of projection and manifold construction, ACM Transactions on Intelligent Systems and Technology (TIST) 9 (5) (2018) 57.
  • (67) C. Xu, D. Tao, C. Xu, Multi-view self-paced learning for clustering., in: IJCAI, 2015, pp. 3974–3980.