Introduction
Many realworld data include diverse types of feature views. For example, web images have both visual and textual features; a protein has structure and interactome features. The various feature views embody consistent and complementary information of the same objects, and have produced intensive research in multiview learning [Bickel and Scheffer2004, Zhao et al.2017]
. The fusion of feature views enables not only the achievement of a comprehensive composite view of the objects, but also facilitates the associated learning task
[Nie, Cai, and Li2017, Tan et al.2018].Various efforts have been focused on the development of effective multiview clustering (MVC) algorithms. Some methods achieve clustering by coregularization [Kumar and Daumé2011, Cheng et al.2013], correlation analysis [Chaudhuri et al.2009], or multiple kernel learning [Gönen and Alpaydın2011, Liu et al.2019]; other approaches learn the shared subspace to extract complementary and shared information of multiview data, and perform clustering therein [Li, Jiang, and Zhou2014, Gao et al.2015, Zhao, Ding, and Fu2017, Zong et al.2017, Kang et al.2019].
Existing MVC solutions focus on generating a single clustering; they fail to present different but meaningful clusterings of the same multiview data [FanaeeT and Thoresen2018]. For example, the threeview objects in Figure 1 have different shapes, colors, and textures. The aforementioned MVC solutions group these objects mainly by shape. But they can also be clustered according to the shared color and texture. These groupings are meaningful but different. In other words, multiple clustering is concerned with both the quality and diversity of alternative clusterings. Although multiple clusterings can present alternative and overlooked meaningful clusterings of the same objects, it is a known dilemma to balance diversity and quality [Bailey2013]. Given this challenge, a number of solutions have been introduced to generate alternative clusterings in different subspaces [Cui, Fern, and Dy2007, Mautz et al.2018, Wang et al.2019], by meta clustering of base clusterings [Caruana et al.2006], by referring to already explored clusterings [Bae and Bailey2006, Yang and Zhang2017], or by simultaneously reducing the redundancy between clusterings [Wang et al.2018, Yao et al.2019a]. However, they still focus on singleview
data. One naive extension is to concatenate diverse feature vectors of the same objects into a longer one, and then directly apply offtheshelf multiple clustering solutions on concatenated vectors. However, this concatenation overrides the intrinsic nature of multiview data, and thus reduces the quality and increases the redundancy of explored clusterings, as our experiments will show.
To find multiple clusterings on multiview data, [Yao et al.2019b] recently proposed a solution called multiview multiple clustering (MVMC). MVMC extracts the individual and shared similarity matrices of multiview data based on the adapted selfrepresentation learning [Luo et al.2018], and then applies seminonnegative matrix factorization [Ding, Li, and Jordan2010] on each combination of the individual and common similarity data matrices to generate alternative clusterings, where the quality is pursued by the commonality matrix and the diversity is obtained by the individuality matrix. However, MVMC: (a) does not differentiate the relevance of different views and suffers from lowquality (irrelevant) data views; (b) does not maintain well the quality and diversity of multiple clusterings; (c) cannot be applied for datasets with a large number of samples, since it has to factorize the combined similarity matrix with size equal to the number of samples.
In this paper, we introduce a deep matrix factorization based solution (DMClusts, as illustrated in Figure 1) to generate multiple diverse clusterings of good quality in a layerwise fashion. DMClusts collaboratively factorizes the multiview data matrices into multiple representational subspaces layerbylayer, and seeks an alternative clustering of quality per layer. To achieve diversity among the clusterings, it reduces their redundancy by means of a new balanced redundancy quantification term, which jointly considers the case when two objects are often grouped together and the case when they are in different clusters of the subspaces. We further introduce an iterative optimization procedure to simultaneously seek multiple clusterings in a layerwise fashion. The main contributions of our work are:

We introduce a deep matrix factorization based solution (DMClusts) to seek multiple clusterings by fusing the consensus and complementary information of multiview data, and by enforcing the diversity between the clusterings layerbylayer. DMClusts can credit different degrees of relevance to different views; as such, it’s less sensitive to noisy (or lowquality) ones.

DMClusts introduces a balanced redundancy quantification term, which jointly considers the case that two samples are often nearby in the representational subspace per layer, and the reverse case that they are often faraway per layer, to comprehensively quantify the redundancy of multiple clusterings, whilst existing similar quantification overlooks the latter case. Extensive experiments on benchmark datasets show that DMClusts significantly outperforms other related competitive multiple clusterings solutions [Yao et al.2019b, Wang et al.2019, Yang and Zhang2017, Ye et al.2016, Jain, Meka, and Dhillon2008, Cui, Fern, and Dy2007] and the deep matrix factorization [Trigeorgis et al.2017] in finding multiple clusterings with quality and diversity.
Our Method
Overview of deep matrix factorization
Matrix factorization techniques have been extensively adopted for data analysis and representation learning in various domains [Tang et al.2017, Fu et al.2018, Li, Tang, and Mei2019]. For example, NMF (nonnegative matrix factorization) [Lee and Seung2001] can decompose a nonnegative data matrix into two factor matrices , the nonnegative constraints imposed on factors allow for better interpretability and lead to significantly growing application of NMF and its variants [Ding, Li, and Jordan2010, Cai et al.2011, Žitnik and Zupan2014]. By taking as cluster centroids in the dimensional feature space, and as the soft membership indicators of samples to these centroids, semiNMF [Ding, Li, and Jordan2010] is equivalent to a soft version of mean clustering. To absorb mixsign , semiNMF only imposes the nonnegative constraints on .
To explore the complex hierarchical structure and to eliminate noise in the data matrix with different modalities, and motivated by the idea and robustness of deep representation learning [Hinton and Salakhutdinov2006, Bengio2009], [Trigeorgis et al.2017] extends semiNMF to deep semiNMF (DMF) as follows:
(1) 
where is the th () layer basis matrix, and is the th layer representation matrix. By taking as the cluster centroids and as the cluster indicators, or separately clustering on , we can obtain clusterings by a deep factorization network with layers. However, these clusterings may have high redundancy, since the overlap between them is ignored.
Multiview data often embody different distributions, which enable different groupings of the same dataset from diverse perspectives. Therefore, it is promising to apply DMF on multiview data to discover multiple clusterings. One simple solution is to concatenate multiple feature views into a single view, and then directly apply DMF on the concatenated view. However, this concatenation does not differentiate the relevance of these views, and results in information override and redundant clusterings. Given that, we propose the multiview multiple clusterings using deep matrix factorization solution.
The proposed method
Suppose is a dataset with different feature views of objects, . To make use of the complementary information and to explore hierarchical representations of multiview data, we formulate our model by extending DMF as follows:
(2) 
where is the userspecified target number of clusterings, is the th () layer mapping for view , quantifies the redundancy between two clusterings and will be discussed later. is introduced to balance the quality and redundancy of clusterings. Since is shared across all the data views, we can expect that fuses the complementary information of multiple data views to generate a highquality representational subspace in the th layer with respect to . In addition, because of the hierarchical representation and redundancy control term, alternative clusterings with diversity can be pursued also.
Our formulation has a close connection with multiview clustering via deep matrix factorization [Zhao, Ding, and Fu2017], which also factorizes multiple data views layerbylayer to extract the complementary information, but it can only generate a single clustering in the final layer. Our task is different from subspace clustering [Domeniconi et al.2007, Luo et al.2018], which seeks only one clustering with different clusters in different subspaces. Our formulation is also different from nonredundant multiple clustering by nonnegative matrix factorization (MNMF) [Yang and Zhang2017], which performs only one layer factorization to find a new clustering by reducing the redundancy between the clustering and already explored ones. As such, MNMF may generate low quality alternative clusterings due to its onelayer representation of data and the heavy dependence on the reference clustering.
Different data views may have a different relevance toward different clusterings. Eq. (2) and MVMC [Yao et al.2019b] assume all the data views have the same relevance toward these clusterings. As such, the noisy or irrelevant data views may compromise the quality of alternative clusterings. To account for the different levels of relevance of the data views toward the alternative clusterings, and reduce the impact of noisy views, we further assign weights to these views for each clustering as follows:
(3) 
where is the weight coefficient for the th data view for generating the th clustering, and is the parameter to control the weights distribution. In this way, multiple data views are selectively fused to generate diverse clustering with quality. For example, in Figure 1, three alternative clusterings (shape, color, texture) can be obtained by different weight assignments of three views.
As we stated, it is important to control the redundancy (or overlap) with alternative clusterings. Most subspace based multiple clusterings solutions reduce the redundancy between clusterings by seeking orthogonal (nonredundant or independent) subspaces [Cui, Fern, and Dy2007, Ye et al.2016, Mautz et al.2018, Wang et al.2019]. DMClusts also has such flavor and seeks a clustering based on each layer’s representation . However, a set of objects maybe nearby in the orthogonally projected subspaces and thus outputs similar clusters in these subspaces. For this reason, we additionally quantify the redundancy between clusterings using (). A coassociation matrix can reflect whether two objects are grouped into the same cluster or not for the th clustering [Fred and Jain2005]. Particularly, if and are grouped into the same cluster, then , otherwise . So if two clusterings ( and ) have a large , there is a high redundancy (or overlap) between them. Since the normalized representation often can not be an exact binary clusterindicator matrix, here we approximate by , which softly quantifies the degree of two objects being grouped into the same cluster for the th layer (or clustering). Based on this approximation, we quantify the overlap between two clusterings in different layers as:
(4) 
where is the matrix trace operator. A large means and are nearby in different representation subspaces, which will be grouped into the same clusters of two different clusterings and increase the overlap.
However, Eq. (4) only accounts for the case that two objects are often projected nearby (grouped into the same clusters) in different representation subspaces, but overlooks the case that two objects are frequently placed faraway (grouped into different clusters) in these subspaces. We want to remark that other multiple clustering solutions [Yang and Zhang2017, Wang et al.2018, Yao et al.2019a] also adopt the idea in Eq. (4) to quantify the redundancy between clusterings, and thus they also overlook the latter case, which emerges when the number of clusters . To remedy this overlook, we introduce a balanced redundancy quantification term as follows:
(5) 
where is the balance coefficient. Eq. (5) considers two extreme cases: (i) many pairwise objects are always nearby in two subspaces, (ii) are always faraway in these subspaces. Both cases increase the overlap of two clusterings. In other words, if many pairwise objects placed into the same clusters for one clustering, but not so for the other clustering, then the redundancy between them is low.
To this end, we can reformulate the objective function of DMClusts as follows:
(6) 
By minimizing the above objective, we can gradually find clusterings, while the quality of these clusterings is pursued by the constraint of the respective layer shared across all the views, and the diversity is pursued by reducing the cases that too many objects always nearby (or faraway) in these representation subspaces. Our experiments will confirm the advantage of these factors.
Optimization
The minimization objective in Eq. (6) is defined with respect to , , and . Since a closeform solution cannot be given, we alternatively optimize one variable while keeping the other two constant. The alternative process is detailed below.
Update rule for : The optimization of Eq. (6) with respect to is:
(7) 
where and . Letting the partial derivative =0, we can obtain
(8) 
Update rule for : Optimizing Eq. (6) with respect to is equivalent to minimizing the following:
(9) 
For the constraint , we introduce the Lagrangian multiplier as follows:
(10) 
Letting the partial derivative and , we can get
(11) 
where , . , , , .
Update rule for : We denote . Eq. (6) with respect to is written as:
(12) 
The Lagrangian function of Eq. (12) is:
(13) 
where is the Lagrangian multiplier. By taking the derivative of Eq. (13) with respect to , and setting it to zero, we have . Since , we can obtain:
(14) 
To this end, we have all the iterative update rules for optimizing three variables of DMClusts. We repeat these updates iteratively until convergence. After that, we run means clustering on each and obtain clusterings.
Time complexity
The time complexity of DMClusts is composed of three parts. For simplicity, we assume all the layers have the same size . DMClusts takes order to update , to update , and to update in each iteration. So the time complexity of DMClusts for generating clusterings on views is , where is the number of iterations to convergence. Generally , , and , thus the complexity of DMClusts is . In our used datasets, DMClusts converges within iterations. On the other hand, the time complexity of MVMC [Yao et al.2019b] is ( is the number of clusters). Clearly, the complexity of DMClusts is linear in , but MVMC is quadratic to . As a result, our DMClusts can scale to larger datasets than MVMC.
Experimental Results and Analysis
Experimental Setup
In this section, we evaluate the effectiveness and efficiency of our proposed DMClusts on seven widelyused multiview datasets, as described in Table 1. The adopted datasets are from different domains, with different numbers of views and objects. More details on the data are given in the Supplementary file.
Multiple clustering approaches aim to achieve diverse clusterings of high quality. To measure quality, we use Silhouette Coefficient (SC) and the Dunn Index (DI) as internal indexes to quantify the compactness and separation of clusters. To measure redundancy, we use Normalized Mutual Information (NMI) and Jaccard Coefficient (JC) as external indexes to quantify the similarity of clusters between two clusterings. We want to emphasize that a higher value of SC and DI means a clustering with higher quality, but a smaller value of NMI and JC implies that two clusterings have a smaller redundancy. These metrics have been widely adopted for evaluating multiple clusterings [Bailey2013, Yang and Zhang2017]. Their formal definitions are given in the Supplementary file.
Datasets  , ,  

Caltech7  1474, 7, 6  [40, 48, 254, 1984, 512, 928] 
Handwritten  2000, 10, 6  [216, 76, 64, 6, 240, 47] 
Reuters  1200, 6, 5  [21531, 24892, 34251, 15506, 11547] 
BBCSport  145, 2, 4  [4659, 4633, 4665, 4684] 
MSRCv1  210, 7, 6  [1302, 48, 512, 100, 256, 210] 
Yale  165, 15, 3  [4096, 3304, 6750] 
Mirflickr  16738, 24, 2  [150, 500] 
Discovering multiple clusterings
To comparatively study the performance of DMClusts, we consider Deckmeans [Jain, Meka, and Dhillon2008], MVMC [Yao et al.2019b], OSC [Cui, Fern, and Dy2007], ISAAC [Ye et al.2016], MNMF [Yang and Zhang2017], and MISC [Wang et al.2019] as comparing methods. The last four methods use different techniques to seek clusterings in subspaces. The input parameters of the comparing methods are fixed (or optimized) as the authors suggested in their papers or shared code. The input parameters of DMClusts are selected from the following ranges: , , and , with . We fix the number of clusters for each clustering to the number of classes of each dataset, as reported in Table 1. Existing multiple clustering algorithms (except MVMC and DMClusts) cannot work on multiple view data. Following the solution in [Yao et al.2019b], we concatenate the feature vectors of multiview data and then run them on the concatenated vectors to seek alternative clusterings. For reference, we also apply DMF [Trigeorgis et al.2017] on the concatenated vectors to gradually explore multiple clusterings layer by layer.
MNMF requires input a reference clustering to find an alternative clustering. Here we use means to generate the reference clustering. For the other comparing methods, we directly use their respective solutions to generate two alternative clusterings (, ). Following the evaluation protocol used by the comparing methods, we measure clustering quality with the average (SC or DI) of and , and we measure the diversity (NMI or JC) between and . Table2
gives the average results of ten independent runs and standard deviations of each method on generating two alternative clusterings. The results of ISAAC and MISC on Reuters and Mirflickr are not reported for their high complexity on large scale datasets.
From Table2, we make the following observations:
(i) Multiview vs. Concatenated view: Both DMClusts and MVMC directly operate on multiview data, and their generated two clusterings have a significant lower redundancy than those generated by other comparing methods. In addition, DMClusts frequently obtains a better quality than other comparing methods that can only work on the concatenated view. This shows that the concatenated feature vectors override the intrinsic nature of multiview data, which help to generate multiple clusterings with diversity. This also expresses the capability of our tailored deep matrix factorization in exploring multiple clusterings with quality.
(ii) DMClusts vs. MVMC: DMClusts generally obtains a significantly better quality (SC and DI) than MVMC, and holds a comparable diversity (NMI and JC). In other words, our DMClusts maintains a better balance of quality and diversity than MVMC. A possible factor is that DMClusts differentiates the relevance of multiple views, whereas MVMC does not. As a result, DMClusts is less sensitive to the noisy views than MVMC. Another factor is that our balanced redundancy term is more comprehensive by considering two types of redundancy, but MVMC considers only one type.
(iii) DMClusts vs. DMF: DMClusts always gives a better performance (both quality and diversity) than DMF, although they both can explore alternative clusterings in a layerwise fashion. The advantage of DMClusts is twofold: it accounts for the different relevance of data views, and can selectively fuse them to generate alternative clusterings with quality, while DMF can only operate on the concatenated features without differentiating these views; it also explicitly controls the diversity between alternative clusterings, while DMF does not.
Deckmeans  MNMF  OSC  ISAAC  MISC  MVMC  DMF  DMClusts  

Caltech7  SC  0.0490.002  0.2340.000  0.2660.000  0.1530.010  0.2010.003  0.1400.002  0.0650.011  0.3010.006 
DI  0.0420.006  0.0370.000  0.0540.000  0.0270.001  0.0480.002  0.0620.000  0.0650.004  0.0900.003  
NMI  0.0210.003  0.0220.000  0.6930.015  0.6450.035  0.5160.015  0.0060.000  0.3100.035  0.0090.000  
JC  0.1270.009  0.0920.000  0.3830.000  0.3580.022  0.3490.018  0.0760.000  0.2350.004  0.0870.002  
BBCSport  SC  0.0880.007  0.0140.000  0.1440.000  0.0390.002  0.0890.002  0.2690.000  0.2040.003  0.2840.006 
DI  0.4870.001  0.4340.000  0.5200.000  0.4110.016  0.3350.009  0.0140.000  0.2550.001  0.4680.010  
NMI  0.0020.000  0.0860.000  0.0010.000  0.0100.001  0.0090.001  0.0000.000  0.1010.009  0.0000.000  
JC  0.4310.030  0.3920.000  0.6050.000  0.4950.015  0.5200.018  0.3470.000  0.4180.001  0.4010.003  
Handwritten  SC  0.0500.006  0.0140.000  0.3520.000  0.2350.007  0.2510.009  0.062 0.000  0.0340.001  0.3770.012 
DI  0.0510.011  0.0090.000  0.1070.000  0.0560.002  0.0520.003  0.0830.000  0.2400.009  0.1590.004  
NMI  0.0700.012  0.0890.000  0.7780.000  0.7120.018  0.6450.014  0.0090.000  0.2120.006  0.0190.001  
JC  0.0730.003  0.0780.003  0.5700.000  0.4840.016  0.4140.019  0.0730.000  0.1140.003  0.0660.000  
MSRCv1  SC  0.0620.003  0.1930.002  0.3820.011  0.1660.003  0.3310.008  0.1130.007  0.0220.001  0.5560.012 
DI  0.0430.007  0.0270.001  0.0710.007  0.0120.002  0.0130.001  0.0980.003  0.2770.010  0.3360.008  
NMI  0.0540.006  0.0630.005  0.7360.054  0.5490.030  0.6650.017  0.0530.006  0.1500.002  0.0380.001  
JC  0.1090.005  0.1240.003  0.5190.025  0.3570.009  0.4710.020  0.0780.002  0.1270.005  0.0870.003  
Yale  SC  0.0330.002  0.0110.001  0.2210.005  0.0200.002  0.0660.008  0.0450.007  0.0210.001  0.3030.019 
DI  0.2050.014  0.1140.004  0.3310.020  0.0760.004  0.0730.003  0.2320.012  0.2850.004  0.2920.015  
NMI  0.2410.021  0.2400.007  0.8120.063  0.3690.007  0.3140.009  0.2510.006  0.3190.006  0.2050.004  
JC  0.0430.002  0.0660.004  0.3570.034  0.0980.003  0.0910.002  0.0550.001  0.0980.005  0.0380.002  
Reuters  SC  0.0020.000  0.1070.009  0.0650.000  —  —  0.1800.000  0.3140.004  0.3440.006 
DI  0.1570.008  0.0700.003  0.2100.000  —  —  0.0380.000  0.0280.001  0.1360.005  
NMI  0.0410.004  0.0330.010  0.4910.000  —  —  0.0040.000  0.5080.005  0.0180.000  
JC  0.1990.005  0.1480.002  0.4540.000  —  —  0.0910.000  0.5900.011  0.1320.003  
Mirflicker  SC  0.0040.000  0.0580.000  0.0170.000  —  —  0.0380.000  0.0050.000  0.3360.008 
DI  0.0610.002  0.0530.001  0.0590.002  —  —  0.1730.005  0.0270.001  0.0760.001  
NMI  0.4270.012  0.0140.000  0.5750.011  —  —  0.0050.000  0.1080.003  0.0430.001  
JC  0.8780.022  0.0230.000  0.3680.011  —  —  0.0220.000  0.0490.001  0.0330.001 
To investigate the robustness of DMClusts to noisy views, we constructed a synthetic dataset on Reuters by injecting a noisy view
following standard Gaussian distribution. We then apply DMF and DMClusts on this synthetic dataset with the input parameters fixed as
, , . Next, we visualize the weights assigned to six views for the first and second clusterings in Figure 2. DMClusts indeed assigns different sets of weights to these views for generating two clusterings with a low overlap (NMI: 0.019, JC: 0.161), and it manifests a robustness to the noisy view by assigning it with a zero weight. As a result, DMClusts holds the similar quality and diversity as on the original Reuters. In contrast, DMF has a nearly 50% reduced quality (SC: 0.158, DI: 0.015) and an about 25% increased diversity (NMI: 0.471, JC: 0.375). The increase in diversity is obtained at the expense of a reduced quality. Nevertheless, DMClusts still gives a better diversity than DMF. This investigation corroborates the benefit of weighting views.To further study whether DMClusts can generate clusterings, we fix the number of target clusterings to and the number clusters for each clustering to . Next, we apply DMClusts, DMF, and MVMC on the Handwritten dataset with images in 10 digits and visualize their clusterings in Figure 3. Each row of the subfigure represents a clustering and each image corresponds to the mean of the cluster. The numbers under each image are the dominant digits (not all) in the cluster. It is well known that the handwritten 10 digits are ambiguous and resemble different numbers (7 alike 4 and 3; 9 alike 5 and 7). As such, there is a tendency to group them together in different alternative clusterings. Due to the use of diversity control, DMClusts presents four clusterings without any completely overlapping clusters. In contrast, DMF does not account for diversity and generates some largely overlapping clusters (i.e., {0, 1, 3}, {2, 4, 7} in and ). Although MVMC also quantifies the redundancy of two objects often grouped into the same cluster of different clusterings, it still generates a heavily overlapping cluster {1, 2, 3} in and . This visual example not only confirms the effectiveness of DMClusts in generating multiple diverse clusterings, but also proves the effectiveness of our balanced redundancy quantification.
Parameter analysis
Several input parameters (, and ) may affect the performance of DMClusts. balances the importance of deep matrix factorization and the diversity control term, controls the weight distribution assigned to input views, balances the redundancy of two objects placed into the same clusters and redundancy of two objects placed into different clusters of two clusterings.
We study the impact of by varying it from to , and plot the change of Quality (DI) and Diversity (1NMI, the larger the better) of DMClusts on the Yale dataset in Figure 3(a) with , . We find that: (i) diversity (1NMI) steadily increases at first but not so when ; (ii) the quality (DI) gradually decreases as increases, and becomes relatively stable after . This pattern is explainable, since a larger forces DMClusts to focus more on the diversity between clusterings, and thus may drag down the quality of the respective clusterings. Overall, this observation confirms the dilemma between diversity and quality of multiple clusterings, and shows the necessity of introducing to control the redundancy.
We investigate the impact of by varying it in the grid of , and report the quality and diversity of DMClusts on the Yale dataset in Figure 3(b) with , . The quality slightly rises as increases, and the diversity remains stable. When , the diversity steadily increases and the quality gradually decreases, due to the increased diversity and the known tradeoff between diversity and quality. This is because a too small gives nearly equal weights to all the views, while a moderate can assign different sets of weights to these views, which helps to generate diverse clusterings, as exampled in Figure 1.
To study the benefit of our balanced redundancy quantification term, we vary from 0 to 1 and report the results in Figure 3(c) with , . We observe that the diversity (1NMI) increases as increase but turns to reduce as . Due to the dilemma between quality and diversity, the quality shows a reverse trend. Neither nor gives the highest diversity, and gives an NMI0. This observation proves the contribution of considering the previously overlooked redundancy due to two samples placed in different clusters of two clusterings, and also justifies the effectiveness of our balanced redundancy quantification term. In addition, it clarifies why our DMClusts obtains a better diversity between clusterings. We observe that (NMI: 0.064) gives a larger diversity (by 16%) than (NMI: 0.075). This suggests the redundancy two samples in the same clusters of two clusterings is more important than the redundancy they in different clusters. Overall, these two types of redundancy complement each other and help to generate multiple clusterings with improved diversity.
We further study the impact of the number of clusters, layer size and of different . We also make a runtime experiment and show that DMClusts not only outperforms the stateoftheart methods in exploring multiple clusterings with quality and diversity, but also holds a moderate efficiency. The results and analysis as well as convergence analysis can be found in the Supplementary file. Finally, we want to remark that, all the four metrics do not depend on the groundtruth labels of the tested dataset, so the suitable values for parameters can be chosen based on the user’s preference toward quality or diversity.
Conclusion
In this paper, we introduce DMClusts to explore multiple clusterings from multiview data, which is an interesting, practical but overlooked clustering topic that conjoins multiview clusterings and multiple clusterings. DMClusts adapts the deep matrix factorization to a deep learning approach, and introduces a novel balanced diversity quantification term to seek multiple diverse clusterings of quality. DMClusts shows a superior effectiveness and efficiency than stateoftheart competitive solutions. We will investigate a principle to determine a suitable number of layers (clusterings).
Acknowledgments
This work is supported by NSFC (61872300 and 61873214), Fundamental Research Funds for the Central Universities (XDJK2019B024), Natural Science Foundation of CQ CSTC (cstc2018jcyjAX0228) and by the King Abdullah University of Science and Technology (KAUST), Saudi Arabia. The code and Supplementary file of DMClusts is available at http://mlda.swu.edu.cn/codes.php?name=DMClusts.
References
 [Bae and Bailey2006] Bae, E., and Bailey, J. 2006. Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In ICDM, 53–62.

[Bailey2013]
Bailey, J.
2013.
Alternative clustering analysis: A review.
In Charu, A., and Chandan, R., eds., Data Clustering: Algorithms and Applications. CRC Press. 535–550. 
[Bengio2009]
Bengio, Y.
2009.
Learning deep architectures for ai.
Foundations and Trends® in Machine Learning
2(1):1–127.  [Bickel and Scheffer2004] Bickel, S., and Scheffer, T. 2004. Multiview clustering. In ICDM, 19–26.
 [Cai et al.2011] Cai, D.; He, X.; Han, J.; and Huang, T. S. 2011. Graph regularized nonnegative matrix factorization for data representation. TPAMI 33(8):1548–1560.
 [Caruana et al.2006] Caruana, R.; Elhawary, M.; Nguyen, N.; and Smith, C. 2006. Meta clustering. In ICDM, 107–118.
 [Chaudhuri et al.2009] Chaudhuri, K.; Kakade, S. M.; Livescu, K.; and Sridharan, K. 2009. Multiview clustering via canonical correlation analysis. In ICML, 129–136.
 [Cheng et al.2013] Cheng, W.; Zhang, X.; Guo, Z.; Wu, Y.; Sullivan, P. F.; and Wang, W. 2013. Flexible and robust coregularized multidomain graph clustering. In KDD, 320–328.
 [Cui, Fern, and Dy2007] Cui, Y.; Fern, X. Z.; and Dy, J. G. 2007. Nonredundant multiview clustering via orthogonalization. In ICDM, 133–142.
 [Ding, Li, and Jordan2010] Ding, C. H.; Li, T.; and Jordan, M. I. 2010. Convex and seminonnegative matrix factorizations. TPAMI 32(1):45–55.

[Domeniconi et al.2007]
Domeniconi, C.; Gunopulos, D.; Ma, S.; Yan, B.; AlRazgan, M.; and
Papadopoulos, D.
2007.
Locally adaptive metrics for clustering high dimensional data.
DAMI 14(1):63–97. 
[FanaeeT and Thoresen2018]
FanaeeT, H., and Thoresen, M.
2018.
Multiinsight visualization of multiomics data via ensemble dimension reduction and tensor factorization.
Bioinformatics 35(10):1625–1633.  [Fred and Jain2005] Fred, A. L., and Jain, A. K. 2005. Combining multiple clusterings using evidence accumulation. TPAMI 27(6):835–850.
 [Fu et al.2018] Fu, G.; Wang, J.; Domeniconi, C.; and Yu, G. 2018. Matrix factorizationbased data fusion for the prediction of lncrna–disease associations. Bioinformatics 34(9):1529–1537.
 [Gao et al.2015] Gao, H.; Nie, F.; Li, X.; and Huang, H. 2015. Multiview subspace clustering. In ICCV, 4238–4246.
 [Gönen and Alpaydın2011] Gönen, M., and Alpaydın, E. 2011. Multiple kernel learning algorithms. JMLR 12(7):2211–2268.

[Hinton and
Salakhutdinov2006]
Hinton, G. E., and Salakhutdinov, R. R.
2006.
Reducing the dimensionality of data with neural networks.
Science 313(5786):504–507. 
[Jain, Meka, and
Dhillon2008]
Jain, P.; Meka, R.; and Dhillon, I. S.
2008.
Simultaneous unsupervised learning of disparate clusterings.
Statistical Analysis and Data Mining 1(3):195–210.  [Kang et al.2019] Kang, Z.; Guo, Z.; Huang, S.; Wang, S.; Chen, W.; Su, Y.; and Xu, Z. 2019. Multiple partitions aligned clustering. In IJCAI, 2701–2707.

[Kumar and Daumé2011]
Kumar, A., and Daumé, H.
2011.
A cotraining approach for multiview spectral clustering.
In ICML, 393–400.  [Lee and Seung2001] Lee, D. D., and Seung, H. S. 2001. Algorithms for nonnegative matrix factorization. In NeurIPS, 556–562.
 [Li, Jiang, and Zhou2014] Li, S.Y.; Jiang, Y.; and Zhou, Z.H. 2014. Partial multiview clustering. In AAAI, 1968–1974.
 [Li, Tang, and Mei2019] Li, Z.; Tang, J.; and Mei, T. 2019. Deep collaborative embedding for social image understanding. TPAMI 41(9):2070–2083.

[Liu et al.2019]
Liu, X.; Zhu, X.; Li, M.; Wang, L.; Zhu, E.; Liu, T.; Kloft, M.; Shen, D.; Yin,
J.; and Gao, W.
2019.
Multiple kernel kmeans with incomplete kernels.
TPAMI 99(1):1–14.  [Luo et al.2018] Luo, S.; Zhang, C.; Zhang, W.; and Cao, X. 2018. Consistent and specific multiview subspace clustering. In AAAI, 3730–3737.
 [Mautz et al.2018] Mautz, D.; Ye, W.; Plant, C.; and Böhm, C. 2018. Discovering nonredundant kmeans clusterings in optimal subspaces. In KDD, 1973–1982.
 [Nie, Cai, and Li2017] Nie, F.; Cai, G.; and Li, X. 2017. Multiview clustering and semisupervised classification with adaptive neighbours. In AAAI, 2408–2414.
 [Tan et al.2018] Tan, Q.; Yu, G.; Domeniconi, C.; Wang, J.; and Zhang, Z. 2018. Incomplete multiview weaklabel learning. In IJCAI, 2703–2709.
 [Tang et al.2017] Tang, J.; Shu, X.; Qi, G.J.; Li, Z.; Wang, M.; Yan, S.; and Jain, R. 2017. Triclustered tensor completion for socialaware image tag refinement. TPAMI 39(8):1662–1674.
 [Trigeorgis et al.2017] Trigeorgis, G.; Bousmalis, K.; Zafeiriou, S.; and Schuller, B. W. 2017. A deep matrix factorization method for learning attribute representations. TPAMI 39(3):417–429.
 [Wang et al.2018] Wang, X.; Yu, G.; Domeniconi, C.; Wang, J.; Yu, Z.; and Zhang, Z. 2018. Multiple coclusterings. In ICDM, 1308–1313.
 [Wang et al.2019] Wang, X.; Wang, J.; Yu, G.; Domeniconi, C.; Xiao, G.; and Guo, M. 2019. Multiple independent subspace clusterings. In AAAI, 5353–5360.
 [Yang and Zhang2017] Yang, S., and Zhang, L. 2017. Nonredundant multiple clustering by nonnegative matrix factorization. Machine Learning 106(5):695–712.
 [Yao et al.2019a] Yao, S.; Yu, G.; Wang, X.; Wang, J.; Domeniconi, C.; and Guo, M. 2019a. Discovering multiple coclusterings in subspaces,. In SDM, 423–431.
 [Yao et al.2019b] Yao, S.; Yu, G.; Wang, J.; Domeniconi, C.; and Zhang, X. 2019b. Multiview multiple clustering. In IJCAI, 4121–4127.
 [Ye et al.2016] Ye, W.; Maurus, S.; Hubig, N.; and Plant, C. 2016. Generalized independent subspace clustering. In ICDM, 569–578.
 [Zhao et al.2017] Zhao, J.; Xie, X.; Xu, X.; and Sun, S. 2017. Multiview learning overview: recent progress and new challenges. Information Fusion 38:43–54.
 [Zhao, Ding, and Fu2017] Zhao, H.; Ding, Z.; and Fu, Y. 2017. Multiview clustering via deep matrix factorization. In AAAI, 2921–2927.
 [Žitnik and Zupan2014] Žitnik, M., and Zupan, B. 2014. Data fusion by matrix factorization. TPAMI 37(1):41–53.
 [Zong et al.2017] Zong, L.; Zhang, X.; Zhao, L.; Yu, H.; and Zhao, Q. 2017. Multiview clustering via multimanifold regularized nonnegative matrix factorization. Neural Networks 88:74–89.
Comments
There are no comments yet.