I Introduction
With the wideapplication of Internet of Things, many collected data are naturally represented with multiple feature views. For instance, an image can be encoded by its color, texture, shape and spatial descriptors. These feature views embody the consistent and complementary information of the same image, which spur extensive research of learning on multiview data [35, 49]. Fusing these feature views can not only form a comprehensive description of the data, but also benefit the learning tasks on them, such as classification [36], clustering [18] and metric learning [13]. This work focuses on multiview clustering (MVC), which aims to excavate complementary and consensus information across multiple views to identify the essential grouping structure with no requirement of labels from these data.
Various attempts have been made to find essential grouping structures of multiview data. Some algorithms force the clustering results of different views being consistent with each other via correlation maximization [5], coregularization [21], fusing multiple similarity matrices of individual views [28], or exploring common and diverse information among views [27]. Other approaches assume the clusters of a clustering are embedded in different subspaces and try to explore these subspaces and find clusters therein [32, 29]
. More recently, deep learning techniques also have been proposed to extract the highorder correlation and dependency between multiview data for effective clustering
[26, 53]. Besides, some other efforts study how to work on multiview data with missing views, i.e., some objects are not available for all the views [24, 46].These mentioned MVC algorithms account for the multiplicity of multiview data, but focus on generating a single clustering result only. In practice, such multiplicity can also support to group the data in multiple different but meaningful clusterings [9, 51]. For example, a bunch of facial images represented with heterogeneous views can be separately grouped from the perspective of identity, sex and of emotions. All these groupings are different but yet meaningful. These reasonable groupings of the data are potentially useful for some purposes, regardless of whether or not it is optimal according to a specific clustering criterion [4]. Alike traditional clustering that focuses on the quality, multiple clusterings additionally pursue the diversity among alternative clusterings. However, it is a knotty task to balance the diversity and quality of these clusterings [3]. Previous works tried to obtain multiple clusterings in independent (or orthogonal) subspaces [6, 30, 41], by eliminating redundancy between the clusterings that are generated successively [2, 48], by executing clustering assignment again for the generated base clusterings[4], or by simultaneously gaining multiple clusterings and controlling the redundancy. However, they were designed for singleview data only.
A few efforts have been made toward exploring multiple clusterings on multiview data. Multiview multiple clusterings (MVMC) [51] mines the individual and shared information of multiview data by utilizing selfrepresentation learning [29], and then decomposes the combinations of the individuality feature matrices and commonality feature matrix by seminonnegative matrix factorization [7] to obtain multiple clusterings. DMClusts[43] is another multiview multiple clusterings algorithm based on deep matrix factorization. It decomposes the multiview data matrices layerbylayer to obtain multiple common subspaces and generate corresponding clusterings therein. These two efforts still ideally assume all the data views are complete. However, this assumption is often violated in practice for some inevitable reasons [24, 36], such as the temporary failure of sensors or the human caused errors. As a result, the collected multiview data are often incomplete. A simple strategy is to remove the samples with missing feature views, but this strategy obviously may throw away too much information, especially when with a high missing rate. Incomplete multiview clustering (IMC) solutions have been proposed to address this practical issue. Some of them resort to matrix factorization to extract the shared subspace [24, 47], fill the missing information [15]
, or use Generate Adversarial Network (GAN)
[10] to replenish the missing data [39, 46]. However, none of existing IMC methods can generate multiple clusterings with both high quality and diversity.To address the drawbacks mentioned above, we propose a deep incomplete multiview multiple clusterings framework (DiMVMC, as illustrated in Figure 1). DiMVMC adopts decoder networks to generate clusterings from representational subspaces and to complete the missing features of instances. The input for the th decoder network is the th shared subspace which is randomly initialized at first, while the output is the reconstruction of multiview data. By alternatively optimizing the decoder deep networks and , DiMVMC can achieve the completeness and individually shared subspaces simultaneously. Moreover, these decoder networks are not isolated, but additionally controlled by a redundancy term based on Hilbert Schmidt Independence Criterion (HSIC) [11], which further enforces the diversity among subspaces and thus reduces the redundancy between clusterings. The main contributions of our work are summarized as follows:

We study how to generate multiple clusterings on multiview data with missing samples, which is an important and practical topic, but more challenging and mostly overlooked by previous solutions. To our knowledge, DiMVMC is the first deep approach to generate multiple clusterings.

DiMVMC discards the adhoc encoder part of autoencoder and works in a unsupervised way, as such DiMVMC has a lower network complexity and can more flexibly deal with data incompleteness in different views. In addition, it uses a redundancy quantification term to reduce the overlap among decoder networks for producing less overlapped representational subspaces, and finally generates diverse clusterings in these subspaces.
Ii Related Works
Our work has close connections with two lines of related works, incomplete multiview clustering and multiple clusterings.
Iia Incomplete Multiview Clustering
Various multiview clustering solutions have been introduced, most of which focus on extracting the consistent/complementary information from different views to induce a consolidated clustering [5, 19, 40], while others additionally mine the individual information to achieve a more robust clustering [29]. These methods all build on the assumption that all the data views are complete. While in practice, it is more often that some samples are absent in some views. To handle such more challenging incomplete multiview clustering (IMC), Li et al. [24] presented the first solution (named PVC) based on NMF (Nonnegative Matrix Factorization) [22], which learned common representations for complete instances and private latent representations for incomplete instances with the same basis matrices. Next, PVC used the common and private representations to seek a clustering. Zhao et al. [54] further integrated PVC and manifold learning to learn the global structure of multiview data. Nevertheless, these NMFbased methods can only deal with twoview data, limiting their application scope. Weighted NMFbased approaches [14, 34] were also proposed to deal with more than two views by filling the missing data and assigning them with lower weights. Wen et al. [44] added an error matrix to compensate the missing data, and combined the original incomplete data matrix with the error matrix to form a completed data matrix for clustering. All these solutions in essence build on NMF, which performs shallow projection that cannot well mine the complex relationships between lowlevel features of multiview data.
To mine nonlinear structures and complex correlations among multiview data, Wang et al. [39] proposed the consistent GAN for the twoview IMC problem, which used one view to generate the missing data of the other view, and then performed clustering on the generated complete data. Xu et al. [46] sought the common latent space of multiview data and performed missing data inference via combining GAN with autoencoder. Ngiam et al. [31] extracted shared representations by training a twoview deep autoencoder to best reconstruct the twoview inputs. Zhang et al. [52] combined autoencoder with Bayesian framework to fully exploit partial multiview data to produce a structured representation for classification. These shallow/deep multiview clustering solutions still focus on producing a single clustering. Given the multiplicity of multiview data, it is more desirable to find different clustering results from the same data and each clustering gives an independent grouping of the data.
IiB Multiple Clusterings
Multiple clusterings focus on how to generate different clusterings with both high quality and diversity from the same data [3]. It is less well studied than single/multiview clustering and ensemble clustering [16, 55], due to its requirement on generating multiple groups of results, and the difficulties on guaranteeing the good quality and diversity at the same time. Bae et al. [2]
proposed a multiple clusterings solution based on hierarchical clustering (COALA). The main idea of COALA is that instances with higher intraclass similarity still gather in one cluster, while those with lower intraclass similarity are considered to be placed into different clusters for another clustering. Jain
et al. presented Deckmeans [17], which obtained diverse clusterings simultaneously by finding multiple groups of mutually orthogonal cluster centroids. Unlike COALA and Deckmeans that directly control the diversity between clustering results, other solutions control the diversity between clustering subspaces and then generate different clusterings in these subspaces. Cui et al. [6] greedily found orthogonal projection matrices to get different feature representations of the original data and then found clusterings in these orthogonal subspaces. Mautz et al. [30]also tried to explore multiple mutually orthogonal subspaces, along with the optimization of kmeans objective function, to find nonredundant clusterings. However, the orthogonal constraint is too strict to generate more than two clusterings. Wang
et al. [41] generated multiple independent subspaces with semantic interpretation via independent subspace analysis, and then performed kernelbased clustering in these subspaces to explore diverse clusterings. Yang and Zhang [48] explicitly defined a regularization term to quantify and minimize the redundancy between the already generated clusterings and the tobegenerated one, and then plugged this regularization into the matrix factorization based clustering [7] to find another clustering. Wang et al. [42] and Yao et al.[50] directly minimized the redundancy between all the tobegenerated clusterings to simultaneously find alternative clusterings. Besides, Caruana et al. [4] firstly generated a number of useful highquality clusterings, and then grouped these clusterings at the metalevel, and thus allowed the user to select a few highquality and nonredundant clusterings for examination. However, these multiple clusterings methods are still restricted to singleview data.Given the multiplicity of multiview data, it is desirable but more difficult to generate multiple clusterings from the same multiview data. Two approaches have been proposed for attacking this challenging task. MVMC [51] first extends multiview selfrepresentation learning [29] to explore the individuality information encoding matrices and the commonality information matrix shared across views, and then combines each individuality similarity matrix and the commonality similarity to generate a distinct clustering by matrix factorization. However, given the cubic time complexity of the selfrepresentation learning, MVMC can hardly be applicable on datasets with a large number of samples. To alleviate this drawback, DMClusts extends the deep matrix factorization [37, 53] to collaboratively factorize the multiview data matrices into multiple representational subspaces layerbylayer, and seeks a different clustering of high quality per layer. In addition, it introduces a new balanced redundancy quantification term to guarantee the diversity among these clusterings, and thus reduces the overlap between the produced clusterings.
The abovementioned single/multiview multiple clusterings solutions assume all instances are complete across views, and project data into linear and shallow subspaces. Therefore, they cannot capture the complex correlations between views and nonlinear clusters in subspaces when data are incomplete. To address these issues, we introduce DiMVMC to mine multiple clusterings from multiview data with missing instances. DiMVMC can capture the complex correlations among views and complete data by multiple decoder networks, and thus generate multiple nonlinear clusterings with quality and diversity.
Iii The Proposed Method
It was empirically demonstrated that different data views are complementary to each other, and they carry distinct information for generating diverse clusterings with quality [51, 43]. However, multiview multiple clusterings is still challenging due to the difficulty in modeling the unknown and complex correlation among different views. Moreover, data with missing views and the required diversity between clusterings further upgrade the difficulty to address the incomplete multiview multiple clusterings problem. Autoencoder is typically used to reconstruct the data with missing/noisy features [12, 46, 52]. The encoder takes input the incomplete data and learns a compact representation, from which the decoder recovers the missing values. To avoid the adhoc design of encoder for the incomplete cases in different views, we skip the encoder and take the shared subspace representation as the input for the th decoder network, from which the observed data are reconstructed and the missing data are completed, as shown in Fig. 1. In addition, we quantify and minimize the redundancy among these subspaces for generating diverse clusterings therein. The following subsections elaborate on the above procedure.
Iiia Generating Multiple Representation Subspaces
Suppose a multiview dataset with views has instances. We use (
) to denote the feature vector for the
th view of the th instance, where is the feature dimension of the th view. An indicator matrix for all instances is defined as:(1) 
where each column of is the status (present/absent) of instances for corresponding views. The relation holds, such that each instance is present in at least one view.
The aim of incomplete multiview multiple clusterings is to integrate all the incomplete views to generate multiple clusterings. Inspired by cross partial multiview networks for classification and by multiview subspace learning [52, 45], we project instances with arbitrary viewmissing patterns into the shared representational subspaces in a flexible way, where the subspaces include the information for observed views. That is, each view can be reconstructed by the obtained shared representation. Based on the reconstruction point of view [23]
, we use the joint distribution to concrete the above idea as follows:
(2) 
where is the multiview shared representation of the th instance, and , By maximizing , the common subspaces can be obtained. However, just alike typical subspace learning methods [38, 8], (2) also optimizes one subspace and a single clustering result therein. Because of the multiplexes, multiview data has a mix of diverse distributions. Therefore, multiple different subspaces and clusterings can coexist. To gain multiple () clusterings, we extend (2) as:
(3) 
where is the shared representation of the th instance in the th shared subspace. Based on different views in , we model the likelihood with respect to given the observation as:
(4) 
where represents the reconstruction loss. Here, we adopt the norm for this reconstruction part. is the mapping function from the common subspace to the th view and are decoder network parameters of .
Without loss of generality, suppose the data are independent and identically distributed, we can induce the loglikelihood function as follows:
(5) 
Since maximizing the likelihood is equivalent to minimizing the loss , by considering the missing case, we can obtain the following objective function for the decoder network:
(6) 
Optimizing the above equation can generate shared representations , each of which will be used to generate a clustering result. Unlike traditional autoencoder based solutions [46, 52], DiMVMC skips the encoder networks, but takes the shared subspace representation as the input for the th decoder to complete multiview data, as done by in (6). As such, DiMVMC does not need to specifically consider diverse missing cases of multiview data, while still makes full use of observed data.
IiiB Reducing Redundancy between Subspaces
By minimizing (6), we can generate multiple common subspaces from incomplete multiview data. For multiple clusterings, besides the quality of different clusterings, the diversity between clusterings is also important [3]. The diversity is usually approximately obtained by minimizing the redundancy between these subspaces. Orthogonality is the most common approach that forces two subspaces being orthogonal with each other. Orthogonality based methods may still generate multiple clusterings with high redundancy, since these orthogonal subspaces can still produce clusters with the same structure [43]. Furthermore, orthogonality does not specify which properties of the reference clustering should or should not be retained. Kullback Leibler (KL) divergence was also adopted to find diverse clusterings [33]
, but KL divergence is not symmetric and not applicable for highdimensional data, due to its high time and space complexity.
HilbertSchmidt Independence Criterion (HSIC) [11] measures the squared norm of the crosscovariance operator over and
in the Hilbert kernel space to estimate the dependency. It is empirically given by:
(7) 
where and are Gram matrices, defined as an inner product between vectors in a specific kernel space. , if , otherwise. In this paper, we adopt the inner product kernel to specify . A lower HSIC value means two subspaces are less correlated. This empirical estimation is simpler than any other kernel dependence test, and requires no userdefined regularisation. In addition, it has a solid theoretical foundation, a fast learning rate with guaranteed exponential convergence, and the capability in measuring both linear and nonlinear dependence between variables. For these merits, we adopt HSIC to quantify the overlap between generated subspaces .
IiiC Unified Model
), we define the loss function of DiMVMC as:
(8) 
where ( is the average of , ) is the normalization factor, is the hyperparameter to balance the sought of subspaces and diversity between them. DiMVMC can generate multiple common subspaces and complete missing data simultaneously via minimizing (8). Since the optimal solution cannot be analytically given, we employ an optimization strategy that alternatively updates or in an iterative way, while fixing the others. More specifically, and are randomly initialized at first. The detailed optimization process is given in Algorithm 1. Once the optimization is done, means is implemented on each obtained subspace , and thus clusterings with quality and diversity can be accordingly generated.
In the subspace clustering, it is desired that the subspace representation is sparse but captures the grouplevel semantic information. There are different ways to bring in sparsity. For example, we can add a dropout layer for the deep learning approach. Here, to make an intuitive implementation, we add a sparsityinduced regularization into the above loss function and define a socalled Sparse DiMVMC:
(9) 
where is the norm of the matrix. When , (9) goes back to the plain DiMVMC.
Iv Experimental Results and Analysis
Iva Experimental Setup
In this section, we evaluate the performance of our proposed DiMVMC on four benchmark multiview datasets, as described in Table I. Caltech20 [25] is a subset of Caltech101 of 6 categories, which contains 2,386 instances and 20 clusters. We utilize 254D CENHIST vector, 512D GIST vector and 928D LBP vector as three views. Handwritten [25] is comprised of 2,000 data points from 0 to 9 digit classes, with 200 data points for each class. There are six public features available. We utilize 240D pixel averages feature, 216D profile correlations feature and 76D LBP feature as three views. Reuters [1] is a textual data set consisting of 111,740 documents in five different languages (English, French, German, Spanish and Italian) of 6 classes. We randomly sample 12000 documents from this collection in a balanced manner. We further do dimensionality reduction on the 12000 samples following the methodology of [20] and represent each view by 256 numeric features. Mirflickr includes 25,000 samples collected from Flicker with an image view and a textual view. Here, we remove textual tags that appear less than 20 times, and then remove samples without textual tags or semantic labels, finally we get 16,738 samples for experiments [51].
Datasets  ,  

Caltech20  2386, 3  20  [254, 512, 928] 
Handwritten  2000, 3  10  [240, 216, 76] 
Reuters  12000, 5  6  [256, 256, 256, 256, 256] 
Mirflickr  16738, 2  24  [150, 500] 
We compare DiMVMC against with six representative and recent multiple clusterings algorithms, including Deckmeans [17], Nrkmeans [30], OSC [6], MNMF [48], MVMC [51] and DMClusts [43]. Deckmeans is a representative multiple clusterings solution based on orthogonalizing the clustering centroids. Nrkmeans [30], OSC [6] and MNMF [48] attempt different techniques to seek subspaces and multiple clusterings therein. For this reason, they have close connections with our approach and are used for experimental comparison. MVMC [51] and DMClusts [43] are the only two multiview multiple clusterings algorithms at present.
None of these compared multiple clusterings algorithms can directly handle missing data, we fill the missing features (instances) with average values for each view at first, and then apply these solutions. For those singleview algorithms, following the solution in [51, 43], we concatenate the feature vectors of multiview data and then apply them on the concatenated vectors to generate different clusterings. Note, we do not take the deep/incomplete multiview clustering solutions for experiments, since they can only output a single clustering. In the following experiments, the input parameters of the comparing methods are fixed (or optimized) as the authors suggested in their papers or generously shared codes.
DiMVMC selected the input parameters from the following ranges: , , and . The alternate clusterings are generated by applying means algorithm on each shared subspace . The number of clusters for alternate clusterings was fixed to of each dataset, as listed in Table I. In this paper, DiMVMC simply adopts two layers’ network structure for each mapping to reconstruct the th view from the th shared subspace . The demo code of DiMVMC is available at http://mlda.swu.edu.cn/codes.php?name=DiMVMC.
Following the evaluation protocol used by the baseline methods [48, 42, 51], we measure the quality of multiple clusterings via the average SC (Silhouette Coefficient) or DI (Dunn Index), and the diversity via the NMI (Normalized Mutual Information) or JC (Jaccard Coefficient). SC and DI quantify the compactness and separation of clusters within a clustering, while NMI and JC quantify the similarity of clusters of two clusterings and . We want to remark that unlike traditional clustering problem, a lower value of NMI and JC means the two alternative cluterings are less overlapped, so a smaller value of them is more preferred.
IvB Discovering Multiple Clusterings
Deckmeans  Nrkmeans  OSC  MNMF  MVMC  DMClusts  DiMVMC  

Caltech20  SC  0.1070.002  0.0530.003  0.1900.001  0.1090.003  0.0970.001  0.0330.002  0.0060.000 
DI  0.0320.000  0.0420.000  0.0450.002  0.0260.000  0.0090.000  0.1150.003  0.2650.006  
NMI  0.0530.002  0.4650.011  0.6670.007  0.0700.001  0.0250.003  0.0650.002  0.0240.001  
JC  0.0460.000  0.1760.008  0.2970.004  0.0450.000  0.0260.001  0.0500.001  0.0250.001  
Handwritten  SC  0.0430.001  0.1260.004  0.3710.003  0.0200.001  0.0240.000  0.0200.001  0.0070.000 
DI  0.0560.000  0.0680.001  0.0740.002  0.0310.002  0.0040.000  0.1730.006  0.6040.009  
NMI  0.0570.001  0.3950.010  0.7560.003  0.0930.001  0.0060.001  0.0610.005  0.0060.000  
JC  0.0650.001  0.2070.011  0.6370.020  0.0780.002  0.0910.001  0.0950.003  0.0520.001  
Reuters  SC  0.0330.000  0.0120.000  0.0130.001  0.0020.000  0.0040.000  0.0150.001  0.0110.000 
DI  0.0470.002  0.0550.003  0.0680.002  0.0190.001  0.0130.000  0.0300.001  0.4340.000  
NMI  0.2310.003  0.3010.005  0.2360.011  0.0010.000  0.0010.000  0.0070.001  0.001 0.000  
JC  0.2900.005  0.2840.012  0.3390.009  0.0940.000  0.0930.000  0.1140.003  0.091 0.000  
Mirflickr  SC  0.0040.000  0.0010.000  0.0170.000  0.0580.000  0.0380.000  0.3360.008  0.0060.000 
DI  0.0610.002  0.0350.001  0.0590.002  0.0530.001  0.1730.005  0.0760.001  0.5360.013  
NMI  0.4270.012  0.5840.005  0.5750.011  0.0140.000  0.0050.000  0.0430.001  0.0050.000  
JC  0.8780.022  0.3630.007  0.3680.011  0.0230.000  0.0220.000  0.0330.001  0.0210.000 
) indicates the preferred direction for the corresponding evaluation metric.
indicates whether our DiMVMC is statistically (according to pairwise test at 95% significance level) superior/inferior to the other method.For the first experiment, we assume the four multiview datasets are complete without any missing data. We report the average results of ten independent runs and standard deviations of each method on generating two alternative clusterings in Table
II.From Table II, we has the following important observations:
(i) Multiview vs. Singleview: DiMVMC, DMClusts and MVMC can be directly applied on multiview data, and their generated two clusterings often have a lower redundancy than those generated by other compared methods. That is because these methods lack a redundancy control term or their redundancy strategies are difficult to be optimized. Our following experiment will further analyze the importance of redundancy control. DiMVMC frequently obtains a better quality than compared methods that can only work on the concatenated single view, which suggests that concatenating the feature vectors overrides the intrinsic nature of multiview data, which helps to generate multiple clusterings with quality. These comparisons prove the effectiveness of our proposed approach in fusing multiple data views to generate multiple clusterings with diversity and quality.
(ii) Shallow methods vs. DiMVMC:
To our knowledge, DiMVMC is the first deep approach to generate multiple clusterings, and it often performs better on quality metrics (SC and DI), owing to the highlevel expression ability of decoder networks. Even though, DiMVMC sporadically has a lower value on SC than some of compared methods. That is due to the widelyrecognized dilemma of obtaining alternative clusterings with both high diversity and quality. DiMVMC has a larger diversity. That is explainable, since it can explore diverse nonlinear representation subspaces by decoder networks, while these shallow methods can only obtain lowlevel feature subspaces. Therefore, DiMVMC has a better tradeoff between quality and diversity than these compared methods. Although DiMVMC, DMClusts and MVMC can generate diverse clusterings from the same multiview data, DiMVMC manifests a better performance than the latter two. That is because DiMVMC can mine the complex correlations between views and features via decoder networks, whereas these compared methods cannot.
In summary, even with complete data across views, DiMVMC outperforms compared methods across different multiview datasets in terms of quality and diversity.
IvC Impact of Missing Data


To study the performance of DiMVMC with missing data views, we define the missing rate as . The instances are randomly selected as missing ones, and the missing views are randomly erased by guaranteeing at least one of them is observed. In this paper, the missing rate is varied from 0 to 50% with an interval as 10%.
Figure 2 shows the impact of missing rate of data on the clustering performance of DiMVMC and of compared methods. With the increase of missing rate, the performance of multiple clusterings methods does degrade. Nrkmeans [30] and OSC [6] are always in a high position in terms of SC at the beginning. This indicates that they can obtain multiple clusterings of high quality under a small rate of missing instances. However, their SC values drop faster than others with the further increase of missing rate, since they do not take into account the missing instances/features. Furthermore, their diversity (1NMI) is also at a lowlevel, suggesting their orthogonal subspaces still have a relatively high redundancy.
The performance curves of multiview methods (MVMC, DMClusts and DiMVMC) drop more slowly than the singleview methods as the increase of missing rate. That is because the correlation between views helps to reduce the impact of missing data, and concatenating features cannot well capture this complementary information. In addition, although the SC curve of DiMVMC is not always in a relatively high level, it always holds better diversity than compared methods. In addition, it holds more stable quality (SC) and diversity (1NMI) curves than compared methods. This observation again echoes the dilemma of balancing the quality and diversity of multiple clusterings.
Finally, DiMVMC can generate clusterings of diversity controlled by the HSIC term. It is the first multiple clusterings algorithm that considers the missing data views, it can reconstruct the incomplete multiview dataset to complete the missing data views. As such, DiMVMC is more robust to missing data. By contrast, the simple data complement strategies used by compared methods are not so robust. As a result, the performance of the compared methods is not as stable as DiMVMC is.
Figure 2 proves that DiMVMC has a better tradeoff between the quality and diversity of multiple clusterings, and is more competent in dealing with incomplete data than compared methods. That can be attributed to the adopted decoder networks and diversity control term, which can more well capture the correlations among different views and handle diverse missing patterns, and enforce the diversity among subspaces.
IvD Parameter Analysis
Parameter balances the generation of multiple subspaces and the diversity control among these subspaces. We study the impact of by griding it from to and plot the variation of quality (SC) and diversity (1NMI, the larger the better) of DiMVMC on Caltech20 dataset in Figure 2(a). We see that: (i) diversity (1NMI) steadily increases at first and then gradually increases; (ii) the quality (SC) gradually decreases and then keeps relatively stable as further increases. Overall, SC keeps relatively stable as varies, but always below the starting point (, no diversity control). This pattern is explainable, since the promotion of diversity between clusterings is often associated with the scarification of quality. We can conclude that indeed helps to boost the diversity between clusterings.
We vary (number of alternative clusterings) from 2 to 6 on Caltech20 dataset to explore the variation of average quality (SC) and diversity (1NMI) of multiple clusterings generated by DiMVMC. In Figure 2(b), with the increase of , the average quality (SC) fluctuates in a small range while the diversity (1NMI) decreases slowly. Overall, DiMVMC can obtain alternative clusterings of quality and diversity.
Based on the base DiMVMC, we lead an norm for each common subspace , extend DiMVMC to a sparse DiMVMC. We apply DiMVMC and sparse DiMVMC (with ) on Caltech20 dataset, and report the results in Figure 3(a). Although the SC values of these two models are both around 0.007, sparse DiMVMC has an average NMI (the lower the better) as 0.022, which is nearly 12% lower than DiMVMC (average NMI 0.025) . This comparison shows that sparsity helps to generate less correlated subspaces (fewer redundant features), and to better control the diversity of alternative clusterings.
IvE Convergence and Complexity Analysis
From Figure 3(b)
, we can find that DiMVMC and Sparse DiMVMC often converge within 15 epochs, while DiMVMC(
) without diversity control converges in 30 epochs. This trend not only proves the efficiency of our proposed alternative optimization strategy, but also shows that our diversity control term and the added norm do not increase the complexity.The memory complexity of DiMVMC can be analyzed by two parts. For simplicity, suppose is the number of layers, is the dimension for any , and is the average dimension of views. DiMVMC takes to save the data elements, and to store the network parameters . So the memory complexity of DiMVMC for generating clusterings on views is . Since most multiview data are typical sparse, the actual space complexity is much smaller.
The time complexity of DiMVMC can be also analyzed by two subproblems. DiMVMC takes to update , and to update . So the time complexity of DiMVMC for generating clusterings on views is , where is the number of iterations. On the other hand, the time complexity of MVMC [51] is , and that of DMClusts [43] is . Thus, the time complexity of DiMVMC is quadratic to , due to the use of HSIC term. MVMC is quadratic to and DMClusts is linear to . Note, both the memory complexity and time complexity of DiMVMC can drop an order of magnitude via batch optimization technique.
Table III gives the runtime of all compared methods. The compared methods are run on a linux server^{1}^{1}1Configuration: Intel Xeon8163, 1TB RAM with NVIDIA Tesla K80.. All methods are implemented by Matlab2014a, except Nrkmeans, DMClusts and DiMVMC are implemented by Python. We observe that the three fastest methods are OSC, Deckmeans and DMClusts, respectively. OSC and Deckmeans do not consider the correlations between views and work on the concatenated views, so they run faster than others. DMClusts employs the efficient semiNMF [7] to decompose multiview data layer by layer and generates multiple clustering simultaneously. Although MNMF also builds on efficient semiNMF, it is constrained by the reference clustering when seeking the other clustering, and has a larger runtime than DMClusts. Nrkmeans needs to update the clustering center many times, so it has a longer runtime than MNMF. MVMC involves with time demanding selfrepresentation learning and the factorization of multiple representational matrices, so it has a longer runtime than others. Our DiMVMC has the largest runtime, since it has to capture the complex correlations between views and generate multiple clusterings with nonlinear clusters via optimizing multiple decoder networks. However, DiMVMC almost always generate multiple clusterings with better quality and diversity than these compared methods.
Deckmeans  Nrkmeans  OSC  MNMF  MVMC  DMClusts  Ours  

Caltech20  334  2664  180  2398  1788  570  3296 
Handwriting  50  1212  58  636  2558  244  1106 
Reuters  1944  2096  1912  10911  54095  4560  67583 
Mirflickr  2386  8976  263  12870  53867  9240  78967 
Total  4714  14948  2413  16470  112308  14614  150952 
V Conclusions
In this paper, we introduced the DiMVMC model to explore alternative clusterings from the ubiquitous incomplete multiview data. DiMVMC can complete the missing data via a group of decoder networks, and seek multiple shared but diverse subspaces (clusterings therein) by further reducing the overlaps between subspaces. Experimental results on benchmark datasets confirm the effectiveness of DiMVMC. We will explore deep alternative clusterings by merging prior knowledge of different perspectives.
References
 [1] (2009) Learning from multiple partially observed views  an application to multilingual text categorization. In NeurIPS, pp. 28–36. Cited by: §IVA.
 [2] (2006) Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In ICDM, pp. 53–62. Cited by: §I, §IIB.

[3]
(2013)
Alternative clustering analysis: a review
. In Data Clustering: Algorithms and Applications, pp. 535–550. Cited by: §I, §IIB, §IIIB.  [4] (2006) Meta clustering. In ICDM, pp. 107–118. Cited by: §I, §IIB.
 [5] (2009) Multiview clustering via canonical correlation analysis. In ICML, pp. 129–136. Cited by: §I, §IIA.
 [6] (2007) Nonredundant multiview clustering via orthogonalization. In ICDM, pp. 133–142. Cited by: §I, §IIB, §IVA, §IVC.
 [7] (2010) Convex and seminonnegative matrix factorizations. TPAMI 32 (1), pp. 45–55. Cited by: §I, §IIB, §IVE.
 [8] (2013) Sparse subspace clustering: algorithm, theory, and applications. TPAMI 35 (11), pp. 2765–2781. Cited by: §IIIA.

[9]
(2018)
Multiinsight visualization of multiomics data via ensemble dimension reduction and tensor factorization
. Bioinformatics 35 (10), pp. 1625–1633. Cited by: §I.  [10] (2014) Generative adversarial nets. In NeurIPS, pp. 2672–2680. Cited by: §I.
 [11] (2005) Measuring statistical dependence with hilbertschmidt norms. In ALT, pp. 63–77. Cited by: §I, §IIIB.
 [12] (1994) Autoencoders, minimum description length and helmholtz free energy. In NeurIPS, pp. 3–10. Cited by: §III.
 [13] (2017) Sharable and individual multiview metric learning. TPAMI 40 (9), pp. 2281–2288. Cited by: §I.
 [14] (2018) Doubly aligned incomplete multiview clustering. In IJCAI, pp. 2262–2268. Cited by: §IIA.
 [15] (2019) Onepass incomplete multiview clustering. In AAAI, pp. 3838–3845. Cited by: §I.
 [16] (2010) Data clustering: 50 years beyond kmeans. PRL 31 (8), pp. 651–666. Cited by: §IIB.

[17]
(2008)
Simultaneous unsupervised learning of disparate clusterings
. Statistical Analysis and Data Mining 1 (3), pp. 195–210. Cited by: item iii, §IIB, §IVA.  [18] (2019) Multiple partitions aligned clustering. In IJCAI, pp. 2701–2707. Cited by: §I.

[19]
(2011)
A cotraining approach for multiview spectral clustering
. In ICML, pp. 393–400. Cited by: §IIA.  [20] (2011) A cotraining approach for multiview spectral clustering. In ICML, pp. 393–400. Cited by: §IVA.
 [21] (2011) Coregularized multiview spectral clustering. In NeurIPS, pp. 1413–1421. Cited by: §I.
 [22] (2001) Algorithms for nonnegative matrix factorization. In NeurIPS, pp. 556–562. Cited by: §IIA.
 [23] (1996) Image representation using 2d gabor wavelets. TPAMI 18 (10), pp. 959–971. Cited by: §IIIA.
 [24] (2014) Partial multiview clustering. In AAAI, pp. 1968–1974. Cited by: §I, §I, §IIA.
 [25] (2015) Largescale multiview spectral clustering via bipartite graph. In AAAI, pp. 2750–2756. Cited by: §IVA.
 [26] (2019) Deep adversarial multiview clustering network. In IJCAI, pp. 2952–2958. Cited by: §I.
 [27] (2015) Partially shared latent factor learning with multiview data. TNNLS 26 (6), pp. 1233–1246. Cited by: §I.
 [28] (2016) Multiple kernel kmeans clustering with matrixinduced regularization. In AAAI, pp. 1888–1894. Cited by: §I.
 [29] (2018) Consistent and specific multiview subspace clustering.. In AAAI, pp. 3730–3737. Cited by: §I, §I, §IIA, §IIB.
 [30] (2018) Discovering nonredundant kmeans clusterings in optimal subspaces. In KDD, pp. 1973–1982. Cited by: item iii, §I, §IIB, §IVA, §IVC.
 [31] (2011) Multimodal deep learning. In ICML, pp. 689–696. Cited by: §IIA.
 [32] (2004) Subspace clustering for high dimensional data: a review. SIGKDD 6 (1), pp. 90–105. Cited by: §I.
 [33] (2009) A principled and flexible framework for finding alternative clusterings. In KDD, pp. 717–726. Cited by: §IIIB.
 [34] (2015) Multiple incomplete views clustering via weighted nonnegative matrix factorization with l2,1 regularization. In ECML/PKDD, pp. 318–334. Cited by: §IIA.

[35]
(2013)
A survey of multiview machine learning
. NPA 23 (78), pp. 2031–2038. Cited by: §I.  [36] (2018) Incomplete multiview weaklabel learning. In IJCAI, pp. 2703–2709. Cited by: §I, §I.
 [37] (2016) A deep matrix factorization method for learning attribute representations. TPAMI 39 (3), pp. 417–429. Cited by: §IIB.
 [38] (2011) Subspace clustering. Signal Processing Magazine IEEE 28 (2), pp. 52–68. Cited by: §IIIA.
 [39] (2018) Partial multiview clustering via consistent gan. In ICDM, pp. 1290–1295. Cited by: §I, §IIA.
 [40] (2015) On deep multiview representation learning. In ICML, pp. 1083–1092. Cited by: §IIA.
 [41] (2019) Multiple independent subspace clusterings. In AAAI, pp. 5353–5360. Cited by: §I, §IIB.
 [42] (2018) Multiple coclusterings. In ICDM, pp. 1308–1313. Cited by: §IIB, §IVA.
 [43] (2020) Multiview multiple clusterings using deep matrix factorization. In AAAI, pp. 6348–6355. Cited by: item iii, §I, §IIIB, §III, §IVA, §IVA, §IVE.
 [44] (2019) Unified embedding alignment with missing views inferring for incomplete multiview clustering. In AAAI, pp. 5393–5400. Cited by: §IIA.
 [45] (2012) Convex multiview subspace learning. In NeurIPS, pp. 1673–1681. Cited by: §IIIA.
 [46] (2019) Adversarial incomplete multiview clustering. In IJCAI, pp. 3933–3939. Cited by: §I, §I, §IIA, §IIIA, §III.
 [47] (2018) Partial multiview subspace clustering. In ACM MM, pp. 1794–1801. Cited by: §I.
 [48] (2017) Nonredundant multiple clustering by nonnegative matrix factorization. Machine Learning 106 (5), pp. 695–712. Cited by: item iii, §I, §IIB, §IVA, §IVA.
 [49] (2018) Multiview clustering: a survey. Big Data Mining and Analytics 1 (2), pp. 83–107. Cited by: §I.
 [50] (2019) Discovering multiple coclusterings in subspaces,. In SDM, pp. 423–431. Cited by: §IIB.
 [51] (2019) Multiview multiple clustering. In IJCAI, pp. 4121–4127. Cited by: item iii, §I, §I, §IIB, §III, §IVA, §IVA, §IVA, §IVA, §IVE.
 [52] (2019) CPMnets: cross partial multiview networks. In NeurIPS, pp. 557–567. Cited by: §IIA, §IIIA, §IIIA, §III.
 [53] (2017) Multiview clustering via deep matrix factorization. In AAAI, pp. 2921–2927. Cited by: §I, §IIB.
 [54] (2016) Incomplete multimodal visual data grouping.. In IJCAI, pp. 2392–2398. Cited by: §IIA.
 [55] (2012) Ensemble methods: foundations and algorithms. CRC Press. Cited by: §IIB.
Comments
There are no comments yet.