Multi-View Multiple Clusterings using Deep Matrix Factorization

Multi-view clustering aims at integrating complementary information from multiple heterogeneous views to improve clustering results. Existing multi-view clustering solutions can only output a single clustering of the data. Due to their multiplicity, multi-view data, can have different groupings that are reasonable and interesting from different perspectives. However, how to find multiple, meaningful, and diverse clustering results from multi-view data is still a rarely studied and challenging topic in multi-view clustering and multiple clusterings. In this paper, we introduce a deep matrix factorization based solution (DMClusts) to discover multiple clusterings. DMClusts gradually factorizes multi-view data matrices into representational subspaces layer-by-layer and generates one clustering in each layer. To enforce the diversity between generated clusterings, it minimizes a new redundancy quantification term derived from the proximity between samples in these subspaces. We further introduce an iterative optimization procedure to simultaneously seek multiple clusterings with quality and diversity. Experimental results on benchmark datasets confirm that DMClusts outperforms state-of-the-art multiple clustering solutions.



There are no comments yet.


page 1

page 2

page 3

page 4


Deep Incomplete Multi-View Multiple Clusterings

Multi-view clustering aims at exploiting information from multiple heter...

Multi-View Multiple Clustering

Multiple clustering aims at exploring alternative clusterings to organiz...

Multi-view Clustering via Deep Matrix Factorization and Partition Alignment

Multi-view clustering (MVC) has been extensively studied to collect mult...

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

High dimensional data often contain multiple facets, and several cluster...

Multi-way Spectral Clustering of Augmented Multi-view Data through Deep Collective Matrix Tri-factorization

We present the first deep learning based architecture for collective mat...

Multi-view Deep One-class Classification: A Systematic Exploration

One-class classification (OCC), which models one single positive class a...

Multiple Independent Subspace Clusterings

Multiple clustering aims at discovering diverse ways of organizing data ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Many real-world data include diverse types of feature views. For example, web images have both visual and textual features; a protein has structure and interactome features. The various feature views embody consistent and complementary information of the same objects, and have produced intensive research in multi-view learning [Bickel and Scheffer2004, Zhao et al.2017]

. The fusion of feature views enables not only the achievement of a comprehensive composite view of the objects, but also facilitates the associated learning task

[Nie, Cai, and Li2017, Tan et al.2018].

Various efforts have been focused on the development of effective multi-view clustering (MVC) algorithms. Some methods achieve clustering by co-regularization [Kumar and Daumé2011, Cheng et al.2013], correlation analysis [Chaudhuri et al.2009], or multiple kernel learning [Gönen and Alpaydın2011, Liu et al.2019]; other approaches learn the shared subspace to extract complementary and shared information of multi-view data, and perform clustering therein [Li, Jiang, and Zhou2014, Gao et al.2015, Zhao, Ding, and Fu2017, Zong et al.2017, Kang et al.2019].

Figure 1: An example of grouping the same objects with three views via deep matrix factorization and diversity control layer-by-layer. The Shape clustering is generated from all three views, while the Color and Texture clusterings are generated from the first two views and the last two views, respectively.

Existing MVC solutions focus on generating a single clustering; they fail to present different but meaningful clusterings of the same multi-view data [Fanaee-T and Thoresen2018]. For example, the three-view objects in Figure 1 have different shapes, colors, and textures. The aforementioned MVC solutions group these objects mainly by shape. But they can also be clustered according to the shared color and texture. These groupings are meaningful but different. In other words, multiple clustering is concerned with both the quality and diversity of alternative clusterings. Although multiple clusterings can present alternative and overlooked meaningful clusterings of the same objects, it is a known dilemma to balance diversity and quality [Bailey2013]. Given this challenge, a number of solutions have been introduced to generate alternative clusterings in different subspaces [Cui, Fern, and Dy2007, Mautz et al.2018, Wang et al.2019], by meta clustering of base clusterings [Caruana et al.2006], by referring to already explored clusterings [Bae and Bailey2006, Yang and Zhang2017], or by simultaneously reducing the redundancy between clusterings [Wang et al.2018, Yao et al.2019a]. However, they still focus on single-view

data. One naive extension is to concatenate diverse feature vectors of the same objects into a longer one, and then directly apply off-the-shelf multiple clustering solutions on concatenated vectors. However, this concatenation overrides the intrinsic nature of multi-view data, and thus reduces the quality and increases the redundancy of explored clusterings, as our experiments will show.

To find multiple clusterings on multi-view data, [Yao et al.2019b] recently proposed a solution called multi-view multiple clustering (MVMC). MVMC extracts the individual and shared similarity matrices of multi-view data based on the adapted self-representation learning [Luo et al.2018], and then applies semi-nonnegative matrix factorization [Ding, Li, and Jordan2010] on each combination of the individual and common similarity data matrices to generate alternative clusterings, where the quality is pursued by the commonality matrix and the diversity is obtained by the individuality matrix. However, MVMC: (a) does not differentiate the relevance of different views and suffers from low-quality (irrelevant) data views; (b) does not maintain well the quality and diversity of multiple clusterings; (c) cannot be applied for datasets with a large number of samples, since it has to factorize the combined similarity matrix with size equal to the number of samples.

In this paper, we introduce a deep matrix factorization based solution (DMClusts, as illustrated in Figure 1) to generate multiple diverse clusterings of good quality in a layer-wise fashion. DMClusts collaboratively factorizes the multi-view data matrices into multiple representational subspaces layer-by-layer, and seeks an alternative clustering of quality per layer. To achieve diversity among the clusterings, it reduces their redundancy by means of a new balanced redundancy quantification term, which jointly considers the case when two objects are often grouped together and the case when they are in different clusters of the subspaces. We further introduce an iterative optimization procedure to simultaneously seek multiple clusterings in a layer-wise fashion. The main contributions of our work are:

  1. We introduce a deep matrix factorization based solution (DMClusts) to seek multiple clusterings by fusing the consensus and complementary information of multi-view data, and by enforcing the diversity between the clusterings layer-by-layer. DMClusts can credit different degrees of relevance to different views; as such, it’s less sensitive to noisy (or low-quality) ones.

  2. DMClusts introduces a balanced redundancy quantification term, which jointly considers the case that two samples are often nearby in the representational subspace per layer, and the reverse case that they are often faraway per layer, to comprehensively quantify the redundancy of multiple clusterings, whilst existing similar quantification overlooks the latter case. Extensive experiments on benchmark datasets show that DMClusts significantly outperforms other related competitive multiple clusterings solutions [Yao et al.2019b, Wang et al.2019, Yang and Zhang2017, Ye et al.2016, Jain, Meka, and Dhillon2008, Cui, Fern, and Dy2007] and the deep matrix factorization [Trigeorgis et al.2017] in finding multiple clusterings with quality and diversity.

Our Method

Overview of deep matrix factorization

Matrix factorization techniques have been extensively adopted for data analysis and representation learning in various domains [Tang et al.2017, Fu et al.2018, Li, Tang, and Mei2019]. For example, NMF (nonnegative matrix factorization) [Lee and Seung2001] can decompose a nonnegative data matrix into two factor matrices , the nonnegative constraints imposed on factors allow for better interpretability and lead to significantly growing application of NMF and its variants [Ding, Li, and Jordan2010, Cai et al.2011, Žitnik and Zupan2014]. By taking as cluster centroids in the -dimensional feature space, and as the soft membership indicators of samples to these centroids, semi-NMF [Ding, Li, and Jordan2010] is equivalent to a soft version of -mean clustering. To absorb mix-sign , semi-NMF only imposes the nonnegative constraints on .

To explore the complex hierarchical structure and to eliminate noise in the data matrix with different modalities, and motivated by the idea and robustness of deep representation learning [Hinton and Salakhutdinov2006, Bengio2009], [Trigeorgis et al.2017] extends semi-NMF to deep semi-NMF (DMF) as follows:


where is the -th () layer basis matrix, and is the -th layer representation matrix. By taking as the cluster centroids and as the cluster indicators, or separately clustering on , we can obtain clusterings by a deep factorization network with layers. However, these clusterings may have high redundancy, since the overlap between them is ignored.

Multi-view data often embody different distributions, which enable different groupings of the same dataset from diverse perspectives. Therefore, it is promising to apply DMF on multi-view data to discover multiple clusterings. One simple solution is to concatenate multiple feature views into a single view, and then directly apply DMF on the concatenated view. However, this concatenation does not differentiate the relevance of these views, and results in information override and redundant clusterings. Given that, we propose the multi-view multiple clusterings using deep matrix factorization solution.

The proposed method

Suppose is a dataset with different feature views of objects, . To make use of the complementary information and to explore hierarchical representations of multi-view data, we formulate our model by extending DMF as follows:


where is the user-specified target number of clusterings, is the -th () layer mapping for view , quantifies the redundancy between two clusterings and will be discussed later. is introduced to balance the quality and redundancy of clusterings. Since is shared across all the data views, we can expect that fuses the complementary information of multiple data views to generate a high-quality representational subspace in the -th layer with respect to . In addition, because of the hierarchical representation and redundancy control term, alternative clusterings with diversity can be pursued also.

Our formulation has a close connection with multi-view clustering via deep matrix factorization [Zhao, Ding, and Fu2017], which also factorizes multiple data views layer-by-layer to extract the complementary information, but it can only generate a single clustering in the final layer. Our task is different from subspace clustering [Domeniconi et al.2007, Luo et al.2018], which seeks only one clustering with different clusters in different subspaces. Our formulation is also different from non-redundant multiple clustering by nonnegative matrix factorization (MNMF) [Yang and Zhang2017], which performs only one layer factorization to find a new clustering by reducing the redundancy between the clustering and already explored ones. As such, MNMF may generate low quality alternative clusterings due to its one-layer representation of data and the heavy dependence on the reference clustering.

Different data views may have a different relevance toward different clusterings. Eq. (2) and MVMC [Yao et al.2019b] assume all the data views have the same relevance toward these clusterings. As such, the noisy or irrelevant data views may compromise the quality of alternative clusterings. To account for the different levels of relevance of the data views toward the alternative clusterings, and reduce the impact of noisy views, we further assign weights to these views for each clustering as follows:


where is the weight coefficient for the -th data view for generating the -th clustering, and is the parameter to control the weights distribution. In this way, multiple data views are selectively fused to generate diverse clustering with quality. For example, in Figure 1, three alternative clusterings (shape, color, texture) can be obtained by different weight assignments of three views.

As we stated, it is important to control the redundancy (or overlap) with alternative clusterings. Most subspace based multiple clusterings solutions reduce the redundancy between clusterings by seeking orthogonal (non-redundant or independent) subspaces [Cui, Fern, and Dy2007, Ye et al.2016, Mautz et al.2018, Wang et al.2019]. DMClusts also has such flavor and seeks a clustering based on each layer’s representation . However, a set of objects maybe nearby in the orthogonally projected subspaces and thus outputs similar clusters in these subspaces. For this reason, we additionally quantify the redundancy between clusterings using (). A co-association matrix can reflect whether two objects are grouped into the same cluster or not for the -th clustering [Fred and Jain2005]. Particularly, if and are grouped into the same cluster, then , otherwise . So if two clusterings ( and ) have a large , there is a high redundancy (or overlap) between them. Since the normalized representation often can not be an exact binary cluster-indicator matrix, here we approximate by , which softly quantifies the degree of two objects being grouped into the same cluster for the -th layer (or clustering). Based on this approximation, we quantify the overlap between two clusterings in different layers as:


where is the matrix trace operator. A large means and are nearby in different representation subspaces, which will be grouped into the same clusters of two different clusterings and increase the overlap.

However, Eq. (4) only accounts for the case that two objects are often projected nearby (grouped into the same clusters) in different representation subspaces, but overlooks the case that two objects are frequently placed faraway (grouped into different clusters) in these subspaces. We want to remark that other multiple clustering solutions [Yang and Zhang2017, Wang et al.2018, Yao et al.2019a] also adopt the idea in Eq. (4) to quantify the redundancy between clusterings, and thus they also overlook the latter case, which emerges when the number of clusters . To remedy this overlook, we introduce a balanced redundancy quantification term as follows:


where is the balance coefficient. Eq. (5) considers two extreme cases: (i) many pairwise objects are always nearby in two subspaces, (ii) are always faraway in these subspaces. Both cases increase the overlap of two clusterings. In other words, if many pairwise objects placed into the same clusters for one clustering, but not so for the other clustering, then the redundancy between them is low.

To this end, we can reformulate the objective function of DMClusts as follows:


By minimizing the above objective, we can gradually find clusterings, while the quality of these clusterings is pursued by the constraint of the respective layer shared across all the views, and the diversity is pursued by reducing the cases that too many objects always nearby (or faraway) in these representation subspaces. Our experiments will confirm the advantage of these factors.


The minimization objective in Eq. (6) is defined with respect to , , and . Since a close-form solution cannot be given, we alternatively optimize one variable while keeping the other two constant. The alternative process is detailed below.

Update rule for : The optimization of Eq. (6) with respect to is:


where and . Letting the partial derivative =0, we can obtain


Update rule for : Optimizing Eq. (6) with respect to is equivalent to minimizing the following:


For the constraint , we introduce the Lagrangian multiplier as follows:


Letting the partial derivative and , we can get


where , . , , , .

Update rule for : We denote . Eq. (6) with respect to is written as:


The Lagrangian function of Eq. (12) is:


where is the Lagrangian multiplier. By taking the derivative of Eq. (13) with respect to , and setting it to zero, we have . Since , we can obtain:


To this end, we have all the iterative update rules for optimizing three variables of DMClusts. We repeat these updates iteratively until convergence. After that, we run -means clustering on each and obtain clusterings.

Time complexity

The time complexity of DMClusts is composed of three parts. For simplicity, we assume all the layers have the same size . DMClusts takes order to update , to update , and to update in each iteration. So the time complexity of DMClusts for generating clusterings on views is , where is the number of iterations to convergence. Generally , , and , thus the complexity of DMClusts is . In our used datasets, DMClusts converges within iterations. On the other hand, the time complexity of MVMC [Yao et al.2019b] is ( is the number of clusters). Clearly, the complexity of DMClusts is linear in , but MVMC is quadratic to . As a result, our DMClusts can scale to larger datasets than MVMC.

Experimental Results and Analysis

Experimental Setup

In this section, we evaluate the effectiveness and efficiency of our proposed DMClusts on seven widely-used multi-view datasets, as described in Table 1. The adopted datasets are from different domains, with different numbers of views and objects. More details on the data are given in the Supplementary file.

Multiple clustering approaches aim to achieve diverse clusterings of high quality. To measure quality, we use Silhouette Coefficient (SC) and the Dunn Index (DI) as internal indexes to quantify the compactness and separation of clusters. To measure redundancy, we use Normalized Mutual Information (NMI) and Jaccard Coefficient (JC) as external indexes to quantify the similarity of clusters between two clusterings. We want to emphasize that a higher value of SC and DI means a clustering with higher quality, but a smaller value of NMI and JC implies that two clusterings have a smaller redundancy. These metrics have been widely adopted for evaluating multiple clusterings [Bailey2013, Yang and Zhang2017]. Their formal definitions are given in the Supplementary file.

Datasets , ,
Caltech7 1474, 7, 6 [40, 48, 254, 1984, 512, 928]
Handwritten 2000, 10, 6 [216, 76, 64, 6, 240, 47]
Reuters 1200, 6, 5 [21531, 24892, 34251, 15506, 11547]
BBCSport 145, 2, 4 [4659, 4633, 4665, 4684]
MSRCv1 210, 7, 6 [1302, 48, 512, 100, 256, 210]
Yale 165, 15, 3 [4096, 3304, 6750]
Mirflickr 16738, 24, 2 [150, 500]
Table 1: Statistics of multi-view datasets. , , are the numbers of objects, clusters and views; are the dimensions of views.

Discovering multiple clusterings

To comparatively study the performance of DMClusts, we consider Dec-kmeans [Jain, Meka, and Dhillon2008], MVMC [Yao et al.2019b], OSC [Cui, Fern, and Dy2007], ISAAC [Ye et al.2016], MNMF [Yang and Zhang2017], and MISC [Wang et al.2019] as comparing methods. The last four methods use different techniques to seek clusterings in subspaces. The input parameters of the comparing methods are fixed (or optimized) as the authors suggested in their papers or shared code. The input parameters of DMClusts are selected from the following ranges: , , and , with . We fix the number of clusters for each clustering to the number of classes of each dataset, as reported in Table 1. Existing multiple clustering algorithms (except MVMC and DMClusts) cannot work on multiple view data. Following the solution in [Yao et al.2019b], we concatenate the feature vectors of multi-view data and then run them on the concatenated vectors to seek alternative clusterings. For reference, we also apply DMF [Trigeorgis et al.2017] on the concatenated vectors to gradually explore multiple clusterings layer by layer.

MNMF requires input a reference clustering to find an alternative clustering. Here we use -means to generate the reference clustering. For the other comparing methods, we directly use their respective solutions to generate two alternative clusterings (, ). Following the evaluation protocol used by the comparing methods, we measure clustering quality with the average (SC or DI) of and , and we measure the diversity (NMI or JC) between and . Table2

gives the average results of ten independent runs and standard deviations of each method on generating two alternative clusterings. The results of ISAAC and MISC on Reuters and Mirflickr are not reported for their high complexity on large scale datasets.

From Table2, we make the following observations:
(i) Multi-view vs. Concatenated view: Both DMClusts and MVMC directly operate on multi-view data, and their generated two clusterings have a significant lower redundancy than those generated by other comparing methods. In addition, DMClusts frequently obtains a better quality than other comparing methods that can only work on the concatenated view. This shows that the concatenated feature vectors override the intrinsic nature of multi-view data, which help to generate multiple clusterings with diversity. This also expresses the capability of our tailored deep matrix factorization in exploring multiple clusterings with quality.
(ii) DMClusts vs. MVMC: DMClusts generally obtains a significantly better quality (SC and DI) than MVMC, and holds a comparable diversity (NMI and JC). In other words, our DMClusts maintains a better balance of quality and diversity than MVMC. A possible factor is that DMClusts differentiates the relevance of multiple views, whereas MVMC does not. As a result, DMClusts is less sensitive to the noisy views than MVMC. Another factor is that our balanced redundancy term is more comprehensive by considering two types of redundancy, but MVMC considers only one type.
(iii) DMClusts vs. DMF: DMClusts always gives a better performance (both quality and diversity) than DMF, although they both can explore alternative clusterings in a layer-wise fashion. The advantage of DMClusts is two-fold: it accounts for the different relevance of data views, and can selectively fuse them to generate alternative clusterings with quality, while DMF can only operate on the concatenated features without differentiating these views; it also explicitly controls the diversity between alternative clusterings, while DMF does not.

Caltech7 SC 0.0490.002 0.2340.000 0.2660.000 0.1530.010 0.2010.003 0.1400.002 0.0650.011 0.3010.006
DI 0.0420.006 0.0370.000 0.0540.000 0.0270.001 0.0480.002 0.0620.000 0.0650.004 0.0900.003
NMI 0.0210.003 0.0220.000 0.6930.015 0.6450.035 0.5160.015 0.0060.000 0.3100.035 0.0090.000
JC 0.1270.009 0.0920.000 0.3830.000 0.3580.022 0.3490.018 0.0760.000 0.2350.004 0.0870.002
BBCSport SC 0.0880.007 0.0140.000 0.1440.000 -0.0390.002 0.0890.002 0.2690.000 0.2040.003 0.2840.006
DI 0.4870.001 0.4340.000 0.5200.000 0.4110.016 0.3350.009 0.0140.000 0.2550.001 0.4680.010
NMI 0.0020.000 0.0860.000 0.0010.000 0.0100.001 0.0090.001 0.0000.000 0.1010.009 0.0000.000
JC 0.4310.030 0.3920.000 0.6050.000 0.4950.015 0.5200.018 0.3470.000 0.4180.001 0.4010.003
Handwritten SC 0.0500.006 0.0140.000 0.3520.000 0.2350.007 0.2510.009 0.062 0.000 0.0340.001 0.3770.012
DI 0.0510.011 0.0090.000 0.1070.000 0.0560.002 0.0520.003 0.0830.000 0.2400.009 0.1590.004
NMI 0.0700.012 0.0890.000 0.7780.000 0.7120.018 0.6450.014 0.0090.000 0.2120.006 0.0190.001
JC 0.0730.003 0.0780.003 0.5700.000 0.4840.016 0.4140.019 0.0730.000 0.1140.003 0.0660.000
MSRCv1 SC -0.0620.003 -0.1930.002 0.3820.011 0.1660.003 0.3310.008 0.1130.007 0.0220.001 0.5560.012
DI 0.0430.007 0.0270.001 0.0710.007 0.0120.002 0.0130.001 0.0980.003 0.2770.010 0.3360.008
NMI 0.0540.006 0.0630.005 0.7360.054 0.5490.030 0.6650.017 0.0530.006 0.1500.002 0.0380.001
JC 0.1090.005 0.1240.003 0.5190.025 0.3570.009 0.4710.020 0.0780.002 0.1270.005 0.0870.003
Yale SC 0.0330.002 -0.0110.001 0.2210.005 -0.0200.002 -0.0660.008 -0.0450.007 0.0210.001 0.3030.019
DI 0.2050.014 0.1140.004 0.3310.020 0.0760.004 0.0730.003 0.2320.012 0.2850.004 0.2920.015
NMI 0.2410.021 0.2400.007 0.8120.063 0.3690.007 0.3140.009 0.2510.006 0.3190.006 0.2050.004
JC 0.0430.002 0.0660.004 0.3570.034 0.0980.003 0.0910.002 0.0550.001 0.0980.005 0.0380.002
Reuters SC -0.0020.000 -0.1070.009 0.0650.000 0.1800.000 0.3140.004 0.3440.006
DI 0.1570.008 0.0700.003 0.2100.000 0.0380.000 0.0280.001 0.1360.005
NMI 0.0410.004 0.0330.010 0.4910.000 0.0040.000 0.5080.005 0.0180.000
JC 0.1990.005 0.1480.002 0.4540.000 0.0910.000 0.5900.011 0.1320.003
Mirflicker SC -0.0040.000 -0.0580.000 0.0170.000 -0.0380.000 0.0050.000 0.3360.008
DI 0.0610.002 0.0530.001 0.0590.002 0.1730.005 0.0270.001 0.0760.001
NMI 0.4270.012 0.0140.000 0.5750.011 0.0050.000 0.1080.003 0.0430.001
JC 0.8780.022 0.0230.000 0.3680.011 0.0220.000 0.0490.001 0.0330.001
Table 2: Quality and Diversity of the various comparing methods on generating multiple clusterings. () indicates the preferred direction for the corresponding measure. indicates whether our DMClusts is statistically (according to pairwise -test at 95% significance level) superior/inferior to the other method.

To investigate the robustness of DMClusts to noisy views, we constructed a synthetic dataset on Reuters by injecting a noisy view

following standard Gaussian distribution. We then apply DMF and DMClusts on this synthetic dataset with the input parameters fixed as

, , . Next, we visualize the weights assigned to six views for the first and second clusterings in Figure 2. DMClusts indeed assigns different sets of weights to these views for generating two clusterings with a low overlap (NMI: 0.019, JC: 0.161), and it manifests a robustness to the noisy view by assigning it with a zero weight. As a result, DMClusts holds the similar quality and diversity as on the original Reuters. In contrast, DMF has a nearly 50% reduced quality (SC: 0.158, DI: 0.015) and an about 25% increased diversity (NMI: 0.471, JC: 0.375). The increase in diversity is obtained at the expense of a reduced quality. Nevertheless, DMClusts still gives a better diversity than DMF. This investigation corroborates the benefit of weighting views.

Figure 2: DMClusts assigns two sets of weights to six views for generating two clusterings. The 6-th view is a noisy view.

To further study whether DMClusts can generate clusterings, we fix the number of target clusterings to and the number clusters for each clustering to . Next, we apply DMClusts, DMF, and MVMC on the Handwritten dataset with images in 10 digits and visualize their clusterings in Figure 3. Each row of the subfigure represents a clustering and each image corresponds to the mean of the cluster. The numbers under each image are the dominant digits (not all) in the cluster. It is well known that the handwritten 10 digits are ambiguous and resemble different numbers (7 alike 4 and 3; 9 alike 5 and 7). As such, there is a tendency to group them together in different alternative clusterings. Due to the use of diversity control, DMClusts presents four clusterings without any completely overlapping clusters. In contrast, DMF does not account for diversity and generates some largely overlapping clusters (i.e., {0, 1, 3}, {2, 4, 7} in and ). Although MVMC also quantifies the redundancy of two objects often grouped into the same cluster of different clusterings, it still generates a heavily overlapping cluster {1, 2, 3} in and . This visual example not only confirms the effectiveness of DMClusts in generating multiple diverse clusterings, but also proves the effectiveness of our balanced redundancy quantification.

(a) DMClusts
(b) DMF
(c) MVMC
Figure 3: Four alternative clusterings ( to ) generated by DMClusts (a), DMF (b) and MVMC (c).
(a) DI and (1-NMI) vs.
(b) DI and (1-NMI) vs.
(c) DI and (1-NMI) vs.
Figure 4: Quality (DI) and Diversity (1-NMI) of DMClusts vs. , and .

Parameter analysis

Several input parameters (, and ) may affect the performance of DMClusts. balances the importance of deep matrix factorization and the diversity control term, controls the weight distribution assigned to input views, balances the redundancy of two objects placed into the same clusters and redundancy of two objects placed into different clusters of two clusterings.

We study the impact of by varying it from to , and plot the change of Quality (DI) and Diversity (1-NMI, the larger the better) of DMClusts on the Yale dataset in Figure 3(a) with , . We find that: (i) diversity (1-NMI) steadily increases at first but not so when ; (ii) the quality (DI) gradually decreases as increases, and becomes relatively stable after . This pattern is explainable, since a larger forces DMClusts to focus more on the diversity between clusterings, and thus may drag down the quality of the respective clusterings. Overall, this observation confirms the dilemma between diversity and quality of multiple clusterings, and shows the necessity of introducing to control the redundancy.

We investigate the impact of by varying it in the grid of , and report the quality and diversity of DMClusts on the Yale dataset in Figure 3(b) with , . The quality slightly rises as increases, and the diversity remains stable. When , the diversity steadily increases and the quality gradually decreases, due to the increased diversity and the known trade-off between diversity and quality. This is because a too small gives nearly equal weights to all the views, while a moderate can assign different sets of weights to these views, which helps to generate diverse clusterings, as exampled in Figure 1.

To study the benefit of our balanced redundancy quantification term, we vary from 0 to 1 and report the results in Figure 3(c) with , . We observe that the diversity (1-NMI) increases as increase but turns to reduce as . Due to the dilemma between quality and diversity, the quality shows a reverse trend. Neither nor gives the highest diversity, and gives an NMI0. This observation proves the contribution of considering the previously overlooked redundancy due to two samples placed in different clusters of two clusterings, and also justifies the effectiveness of our balanced redundancy quantification term. In addition, it clarifies why our DMClusts obtains a better diversity between clusterings. We observe that (NMI: 0.064) gives a larger diversity (by 16%) than (NMI: 0.075). This suggests the redundancy two samples in the same clusters of two clusterings is more important than the redundancy they in different clusters. Overall, these two types of redundancy complement each other and help to generate multiple clusterings with improved diversity.

We further study the impact of the number of clusters, layer size and of different . We also make a runtime experiment and show that DMClusts not only outperforms the state-of-the-art methods in exploring multiple clusterings with quality and diversity, but also holds a moderate efficiency. The results and analysis as well as convergence analysis can be found in the Supplementary file. Finally, we want to remark that, all the four metrics do not depend on the ground-truth labels of the tested dataset, so the suitable values for parameters can be chosen based on the user’s preference toward quality or diversity.


In this paper, we introduce DMClusts to explore multiple clusterings from multi-view data, which is an interesting, practical but overlooked clustering topic that conjoins multi-view clusterings and multiple clusterings. DMClusts adapts the deep matrix factorization to a deep learning approach, and introduces a novel balanced diversity quantification term to seek multiple diverse clusterings of quality. DMClusts shows a superior effectiveness and efficiency than state-of-the-art competitive solutions. We will investigate a principle to determine a suitable number of layers (clusterings).


This work is supported by NSFC (61872300 and 61873214), Fundamental Research Funds for the Central Universities (XDJK2019B024), Natural Science Foundation of CQ CSTC (cstc2018jcyjAX0228) and by the King Abdullah University of Science and Technology (KAUST), Saudi Arabia. The code and Supplementary file of DMClusts is available at


  • [Bae and Bailey2006] Bae, E., and Bailey, J. 2006. Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In ICDM, 53–62.
  • [Bailey2013] Bailey, J. 2013.

    Alternative clustering analysis: A review.

    In Charu, A., and Chandan, R., eds., Data Clustering: Algorithms and Applications. CRC Press. 535–550.
  • [Bengio2009] Bengio, Y. 2009. Learning deep architectures for ai.

    Foundations and Trends® in Machine Learning

  • [Bickel and Scheffer2004] Bickel, S., and Scheffer, T. 2004. Multi-view clustering. In ICDM, 19–26.
  • [Cai et al.2011] Cai, D.; He, X.; Han, J.; and Huang, T. S. 2011. Graph regularized nonnegative matrix factorization for data representation. TPAMI 33(8):1548–1560.
  • [Caruana et al.2006] Caruana, R.; Elhawary, M.; Nguyen, N.; and Smith, C. 2006. Meta clustering. In ICDM, 107–118.
  • [Chaudhuri et al.2009] Chaudhuri, K.; Kakade, S. M.; Livescu, K.; and Sridharan, K. 2009. Multi-view clustering via canonical correlation analysis. In ICML, 129–136.
  • [Cheng et al.2013] Cheng, W.; Zhang, X.; Guo, Z.; Wu, Y.; Sullivan, P. F.; and Wang, W. 2013. Flexible and robust co-regularized multi-domain graph clustering. In KDD, 320–328.
  • [Cui, Fern, and Dy2007] Cui, Y.; Fern, X. Z.; and Dy, J. G. 2007. Non-redundant multi-view clustering via orthogonalization. In ICDM, 133–142.
  • [Ding, Li, and Jordan2010] Ding, C. H.; Li, T.; and Jordan, M. I. 2010. Convex and semi-nonnegative matrix factorizations. TPAMI 32(1):45–55.
  • [Domeniconi et al.2007] Domeniconi, C.; Gunopulos, D.; Ma, S.; Yan, B.; Al-Razgan, M.; and Papadopoulos, D. 2007.

    Locally adaptive metrics for clustering high dimensional data.

    DAMI 14(1):63–97.
  • [Fanaee-T and Thoresen2018] Fanaee-T, H., and Thoresen, M. 2018.

    Multi-insight visualization of multi-omics data via ensemble dimension reduction and tensor factorization.

    Bioinformatics 35(10):1625–1633.
  • [Fred and Jain2005] Fred, A. L., and Jain, A. K. 2005. Combining multiple clusterings using evidence accumulation. TPAMI 27(6):835–850.
  • [Fu et al.2018] Fu, G.; Wang, J.; Domeniconi, C.; and Yu, G. 2018. Matrix factorization-based data fusion for the prediction of lncrna–disease associations. Bioinformatics 34(9):1529–1537.
  • [Gao et al.2015] Gao, H.; Nie, F.; Li, X.; and Huang, H. 2015. Multi-view subspace clustering. In ICCV, 4238–4246.
  • [Gönen and Alpaydın2011] Gönen, M., and Alpaydın, E. 2011. Multiple kernel learning algorithms. JMLR 12(7):2211–2268.
  • [Hinton and Salakhutdinov2006] Hinton, G. E., and Salakhutdinov, R. R. 2006.

    Reducing the dimensionality of data with neural networks.

    Science 313(5786):504–507.
  • [Jain, Meka, and Dhillon2008] Jain, P.; Meka, R.; and Dhillon, I. S. 2008.

    Simultaneous unsupervised learning of disparate clusterings.

    Statistical Analysis and Data Mining 1(3):195–210.
  • [Kang et al.2019] Kang, Z.; Guo, Z.; Huang, S.; Wang, S.; Chen, W.; Su, Y.; and Xu, Z. 2019. Multiple partitions aligned clustering. In IJCAI, 2701–2707.
  • [Kumar and Daumé2011] Kumar, A., and Daumé, H. 2011.

    A co-training approach for multi-view spectral clustering.

    In ICML, 393–400.
  • [Lee and Seung2001] Lee, D. D., and Seung, H. S. 2001. Algorithms for non-negative matrix factorization. In NeurIPS, 556–562.
  • [Li, Jiang, and Zhou2014] Li, S.-Y.; Jiang, Y.; and Zhou, Z.-H. 2014. Partial multi-view clustering. In AAAI, 1968–1974.
  • [Li, Tang, and Mei2019] Li, Z.; Tang, J.; and Mei, T. 2019. Deep collaborative embedding for social image understanding. TPAMI 41(9):2070–2083.
  • [Liu et al.2019] Liu, X.; Zhu, X.; Li, M.; Wang, L.; Zhu, E.; Liu, T.; Kloft, M.; Shen, D.; Yin, J.; and Gao, W. 2019.

    Multiple kernel k-means with incomplete kernels.

    TPAMI 99(1):1–14.
  • [Luo et al.2018] Luo, S.; Zhang, C.; Zhang, W.; and Cao, X. 2018. Consistent and specific multi-view subspace clustering. In AAAI, 3730–3737.
  • [Mautz et al.2018] Mautz, D.; Ye, W.; Plant, C.; and Böhm, C. 2018. Discovering non-redundant k-means clusterings in optimal subspaces. In KDD, 1973–1982.
  • [Nie, Cai, and Li2017] Nie, F.; Cai, G.; and Li, X. 2017. Multi-view clustering and semi-supervised classification with adaptive neighbours. In AAAI, 2408–2414.
  • [Tan et al.2018] Tan, Q.; Yu, G.; Domeniconi, C.; Wang, J.; and Zhang, Z. 2018. Incomplete multi-view weak-label learning. In IJCAI, 2703–2709.
  • [Tang et al.2017] Tang, J.; Shu, X.; Qi, G.-J.; Li, Z.; Wang, M.; Yan, S.; and Jain, R. 2017. Tri-clustered tensor completion for social-aware image tag refinement. TPAMI 39(8):1662–1674.
  • [Trigeorgis et al.2017] Trigeorgis, G.; Bousmalis, K.; Zafeiriou, S.; and Schuller, B. W. 2017. A deep matrix factorization method for learning attribute representations. TPAMI 39(3):417–429.
  • [Wang et al.2018] Wang, X.; Yu, G.; Domeniconi, C.; Wang, J.; Yu, Z.; and Zhang, Z. 2018. Multiple co-clusterings. In ICDM, 1308–1313.
  • [Wang et al.2019] Wang, X.; Wang, J.; Yu, G.; Domeniconi, C.; Xiao, G.; and Guo, M. 2019. Multiple independent subspace clusterings. In AAAI, 5353–5360.
  • [Yang and Zhang2017] Yang, S., and Zhang, L. 2017. Non-redundant multiple clustering by nonnegative matrix factorization. Machine Learning 106(5):695–712.
  • [Yao et al.2019a] Yao, S.; Yu, G.; Wang, X.; Wang, J.; Domeniconi, C.; and Guo, M. 2019a. Discovering multiple co-clusterings in subspaces,. In SDM, 423–431.
  • [Yao et al.2019b] Yao, S.; Yu, G.; Wang, J.; Domeniconi, C.; and Zhang, X. 2019b. Multi-view multiple clustering. In IJCAI, 4121–4127.
  • [Ye et al.2016] Ye, W.; Maurus, S.; Hubig, N.; and Plant, C. 2016. Generalized independent subspace clustering. In ICDM, 569–578.
  • [Zhao et al.2017] Zhao, J.; Xie, X.; Xu, X.; and Sun, S. 2017. Multi-view learning overview: recent progress and new challenges. Information Fusion 38:43–54.
  • [Zhao, Ding, and Fu2017] Zhao, H.; Ding, Z.; and Fu, Y. 2017. Multi-view clustering via deep matrix factorization. In AAAI, 2921–2927.
  • [Žitnik and Zupan2014] Žitnik, M., and Zupan, B. 2014. Data fusion by matrix factorization. TPAMI 37(1):41–53.
  • [Zong et al.2017] Zong, L.; Zhang, X.; Zhao, L.; Yu, H.; and Zhao, Q. 2017. Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Networks 88:74–89.