I Introduction
Multiview clustering (MVC) is a fundamental learning task in data mining, image segmentation and pattern recognition
[29, 6, 36, 21]. The key of MVC is to find the consistency and complementary information among each view which is described by different aspects, which has been attracted enormous attention. Existing multiview clustering approaches can be categorized into four categories according to the mechanisms and principles involved, namely, cotraining, multikernel clustering, graph clustering and subspace clustering [17, 2, 11, 43, 30]. Cotraining algorithms bootstrap the clustering results of different views by using the prior or learning knowledge from others [12, 13, 20]. For multikernel clustering, the data are first mapped to highdimensional spaces through kernel functions, and then these kernels are combined linearly or nonlinearly to improve clustering performance [14, 8, 28, 4]. Multiview graph clustering algorithms aim to construct graph similarity matrix for individual views, and the main challenge is how to approximately obtain a fusion graph [15, 33, 32, 39, 37, 18, 35, 42]. Multiview subspace clustering can be further divided into subspacebased methods [34, 3, 19, 23, 41, 45, 27, 22] and matrix factorization methods [35, 46]. Both of them are designed to learn a lowdimensional representation shared by all views. Our paper belongs to the nonnegative matrix factorization method.In recent years, nonnegative matrix factorization (NMF) in Multiview subspace clustering has been developed to a certain extent. A novel NMFbased multiview clustering algorithm has been proposed by searching for a factorization that gives compatible clustering solutions across multiple views [16]. The work in [47] proposes a multimanifold regularized nonnegative matrix factorization framework (MMNMF) which can preserve the locally geometrical structure of the manifolds for multiview clustering. A method of [31]
aims at multiview feature selection and fusion problems by using matrix factorization. In
[15], a novel NMF model with coorthogonal constraints is designed to deal with the MVC problem. However, most algorithms based on matrix factorization follow the singlelayer strategy. Only a few algorithms such as the method DMVC in [44] uses the deep semiNMF framework inspired by work [24]. DMVC focuses on the intrinsic geometric structure of each view, so graph regularizations are introduced to couple the output representation of deep structures. However, DMVC needs to learn the values of hyperparameters. Then a method
[10] has been proposed to solve this problem and performance has been further improved. Although these methods have achieved success, they can also be considered to be improved from the following perspectives: Since different views represent various attributes of the data items, the viewspecific features have been discarded in existing methods and are forced to be consistent among various views. According to [24], there is still a large gap of fully discovering the rich hidden information of original data with deep factorization matrix structures from existing mechanisms.In this paper, we propose a multiview clustering method via deep semiNMF to solve the above problems. We jointly optimize the representation learning of each view and the late fusion stage in a unified framework, which terms as multiview clustering with deep semiNMF and global graph refinement (MVCDMFGGR). Firstly, we learn a lowdimensional and more compact representation for each view through the deep semiNMF framework. As these representations originate from different views, the specific information across views can be well captured. Secondly, we use these learned representations to reconstruct the graph structure of each view and then merge them to approximate a common graph structure. Although the representation of each view may be different, the graph structure of each view tends to be similar. Because they all represent the same batch of samples. Therefore, following traditional graphbased methods, we combine the representation learning and common graph structure learning for joint optimization and hope to obtain an optimal graph structure for clustering. Besides, extensive experiments on six benchmark datasets are performed to evaluate the effectiveness of our proposed method. The proposed method enjoys superior clustering performance by comparing with some stateoftheart methods.
The contributions of this paper are summarized as follows,

We propose a multiview clustering method with deep semiNMF and global graph refinement (MVCDMFGGR). In this work, we unify the representation learning and graph structure learning into one framework, which can promote and guide each other and reach a best consensus for clustering.

Through introducing the deep semiNMF framework, we decompose the feature matrix by multiple layers and capture the underlying information of each view. In the fusion stage, the graph regularization item is introduced to learn a graph structure shared by each view. The common graph unifies the internal geometric structures of data among different views.

Extensive experiments are conducted on six multiview datasets and our proposed method shows clear superiority over other SOTA methods.
The rest of the paper is organized as follows. Section II outlines the related work of multiview clustering via NMF. Section III introduces the method we have proposed and the alternate algorithm that to solve the optimization problem with its convergence and the computational complexity analysis. Section IV introduces the datasets and compared methods and shows the experiment results with analysis. The ending of this paper is a conclusion in Section V.
Ii Related Work
We introduce some notations firstly. represents a matrix which with bold capital symbol. , and represent its th row, th column and the th element. denotes the th layer and denotes the th view. The Forbenius norm of matrix is denoted as and the trace of matrix is denoted as . and denote the transpose and the MoorePenrose generalized inverse of matrix respectively. We separate the negative parts and positive parts of matrix as and .
In this part, we briefly review several of the most related works, including SemiNMF, deep SemiNMF, Multiview clustering via DMF, etc.
Iia SemiNMF
Nonnegative matrix factorization is an important theme in matrix factorization, which can be used to solve clustering, spectral decomposition, and subspace identification. In reality, the source data we get may have mixed signs. The work in [5] extends traditional NMF to semiNMF and gives an alternately updating algorithm of related variables. NMF can be written as:
(1) 
SemiNMF can be written as:
(2) 
Where denotes the input data with samples and each sample is composed of dimensional feature. in Eq. (2) represents the elements of the original data are positive and in Eq. (LABEL:SemiNMF) represents the elements of the original data are mixed. When NMF or semiNMF is used in clustering, is the cluster centroid matrix and is the soft clustering assignment matrix or the representation of dimensional. The differences between NMF and SemiNMF can be concluded that the elements of and in NMF are forced to be positive, while in SemiNMF they can be mixsign.
The optimization problem of SemiNMF in Eq. (LABEL:SemiNMF) can be solved by alternately updating Z and H:
i) Optimizing by given . By fixing the soft clustering assignment matrix , the optimization Eq. (LABEL:SemiNMF) can be considered as an unconstrained problem as:
. By setting , give the solutions as .
ii) Optimizing by given . With fixed, can be optimized via solving the problem as with constraint . By using Lagrange Method, we can obtain the update rule of which satisfies the KKT condition as follow,
(3) 
IiB Deep SemiNMF for representation learning
The lowdimensional onelayer representation obtained by SemiNMF cannot preserve the original feature well due to the limitd representation ability. So a deep SemiNMF framework for singleview has been proposed in [24]
, which is able to learn a lower and hidden representation. This method promotes the applications of semiNMF and provides interpretability for the improvement of clustering performance. Deep SemiNMF can be written as,
(4) 
where denotes the mapping between feature matrix and the th representation . denotes the mapping between the th representation and the th representation . In other words, . denotes the depth of SemiNMF. Following the work in [5], we denote . The optimization problem can be solved by alternately updating and :
i) Optimizing while others be fixed. The optimization Eq. (4) can be written as an unconstrained problem as: . By setting , we can give the solution as .
ii) Optimizing while others be fixed. can be optimized via solving the problem as with constraintion . By using Lagrange method, we can obtain the update rule of which satisfies the KKT condition as follow,
(5) 
IiC Dmvc
The work of [44] combines deep semiNMF with multiview clustering which is called DMVC. The proposed method solves the clustering problem with constant geometric structure and representation learning by multilayer simultaneously. Formally, multiview clustering with deep semiNMF can be mathematically written as,
(6) 
We denote , where represents the feature matrix of the th view. and denote with sample and each sample is of dimensional feature. Similar to subsection IIB, denotes the mapping between feature matrix and the th representation of the th view. denotes the mapping between the th representation and the th representation of the th view. is the number of views and is the number of layers or called the depth of SemiNMF. is the consensus latent representation for all views. is the weighting coefficient of the th view and is a coefficient that controls the weights distribution. denotes the th graph Laplacian, where is constructed by feature matrix using nearest neighbor and . The optimization problem of Eq. (6) can be solved by alternately updating , , and . The update rule of , and are similar to the method deep semiNMF. As for updating , we can use Lagrange method and take the derivative of Lagrange function with respect to .
Iii The proposed method
Notations  Meaning 

Feature matrix of the th view  
th layer cluster centroid matrix of the th view  
th layer cluster centroid matrix of the th view  
th layer feature representation of the th view  
the similarity matrix of the th view  
Consensus similarity matrix  
We introduce some basic notations of our method firstly as described in Table I. We also explain in the relevant places of the paper for reading easily.
As we mentioned before, the representations of all views in the last layer should be different in theory and the global graph structure which represents the relationship between samples should be consistent. Therefore, different from DMVC, we assume that the feature representations of the last layer in different views are different and a consensus local structure matrix should be fused with individual structures. The idea can be mathematically expressed as follows,
(7)  
The meaning of , and are similar to these symbols in Eq. (6) described in Table I. denotes the th layer of the th view. constructs the similarity matrix in different layer. is the weight coefficient of the th view for . denotes the consensus similarity matrix. denotes the similarity score between th and th sample so we need to add the constraints and for . The larger value is, the more likely two samples belong to the same cluster. We hope to obtain normalized solution, so we add the constraint .
Iiia Initialization
Inspired by the tricks of the initialization in [7], we have pretrained all of the layers to initialize the variables and by decomposing layer by layer. Firstly, we decompose the feature matrix of the th view , where and . Following this, we decompose the new feature matrix , where and . We repeat the above steps until all layers have been pretrained. We pretrain each of the layers to have an initial approximation of the matrices and which can greatly reduce the time for followup work. Then we use the value of and to initialize and by setting and . At the beginning, we argue that each view has the same contribution, so we initialize by the construction of with the same weight.
IiiB Optimization
Because the objective function Eq. (7) is a nonconvex problem, it seems unlikely to solve this problem in one step. So we propose a fivestep alternate optimization method to address this problem. To reduce the total reconstruction error of the model, we also need to alternately minimize and in each layer.
IiiB1 Update rule for matrix
By fixing , , and (), we can update by solving the following problem without constraint,
(8) 
where , by setting , we can give the solutions as,
(9) 
where and . and denotes the reconstruction of the th layer’s representation for the th view.
IiiB2 Update rule for matrix
By fixing , , and , we can update by solving the following problem,
(10) 
where . Following the update rule in [5], the update rule for can be written as,
(11) 
We also update here for faster convergence and easier code writing.
IiiB3 Update rule for matrix
By fixing , , and , we can update by solving the following problem,
(12) 
where the variables are defined as follows,
(13) 
We give the updating rule of firstly, followed by the proof of it.
(14) 
Theorem 1
The limited solution of the update rule in Eq. (14) satisfies the KKT condition.
Proof 1
We introduce the Lagrangian function as
(15) 
In order to satisfy the constraint , we introduce the Lagrangian multiplier . By setting , we can obtation:
(16) 
From the complementary slackness condition, we can obtain,
(17) 
Both the equations require that at least one of the two factors is equal to zero, so Eq. (17) and Eq. (18) have the same meaning. We multiply both sides by and we can obtain:
(18) 
Eq. (18) is a fixed point equation. By noting that , etc, it is easy to get the update rule Eq. (14) for and to see that the equation satisfies the fixed point equation. At convergence, .
IiiB4 Update rule for matrix
By fixing , and , we can update by solving the following problem,
(19) 
where . This problem yields a closeformed solution that,
(20) 
where is the th row of , is the th row of .
Proof 2
The problem of Eq. (19) can be easily rewritten into rowformed independent optimization problems as follow,
(21) 
The Lagrangian function of Eq. (21) is,
(22) 
where and are the Lagrangian multipliers for the constraints and respectively. Then the KKT condition is written as,
(23) 
We can easily obtain the Eq. (20).
IiiB5 Update rule for coefficient
By fixing , and , we can update by solving the following problem,
(24) 
Supposing , we have that,
(25) 
Note that =, we have =. Taking them into Eq. (25), the optimization can be written as follows,
(26) 
where the variables are defined as follows,
(27)  
For every , we have that . So the matrix is a positive semidefinite matrix and quadratic programming could be used in Eq. (26).
The entire approach is outlined in Algorithm 1
. We train the proposed algorithm at least 150 iterations until convergence, then we perform spectral clustering on
to obtain the clustering results.IiiC Analysis and discussions
Computational Complexity: Pretraining and finetuning are the two main stages of our proposed method, and we will analyze them separately. To make the analysis clearer, we assume the dimensions in all the layers are the same. So we denote and the dimensions of the original feature for all the views are the same which denoted . denotes the number of iterations to achieve convergence in pretraining process and denotes the number of iterations to achieve convergence in finetuning process. So the complexity of pretraining and finetuning stages are and respectively, where normally. In conclusion, the time complexity of our algorithm is .
Convergence: It is easy to obtain that the lower bound of the whole optimization function is 0. When we optimize one variable with fixing the others, the four (optimizing and as one subproblem) subproblems are strictly convex and the objective of Algorithm 1 is monotonically decreased at each iteration. As a result, the proposed algorithm can be confirmed to be convergent.
Dataset  Views  Samples  Classes  Datatypes 

HW  2  2000  10  Image 
BBCSport  2  544  5  Text 
3Sources  3  169  6  Text 
BBC  4  685  5  Text 
CiteSeer  2  3312  6  Text 
ORL  3  400  40  Image 
Iv Experiments
In this part, we evaluate the clustering performance, the parameter sensitivity, and the convergence of Algorithm 1 on six benchmark datasets.
Iva Benchmark Datasets
We select six datasets of two types: image and text. The key information of the datasets is shown in Table II and the sample images from two image data sets are illustrated in Figure 2. The details of these datasets are given below:

HW^{1}^{1}1http://archive.ics.uci.edu/ml/datasets/Multiple+
Featurescontains 2000 images of 09 tendigit classes. Each class has 200 images, which are described by six views. These classes including Profile correlations (216), Fourier coefficients (76), Karhunen coefficients (64), Morphological (6), Pixel averages (240), and Zernike moments (47). The number in brackets represents the dimension of each view. The data we use just includes two views with Profile correlations and Pixel averages.

BBCSprot^{2}^{2}2http://mlg.ucd.ie/datasets/segment.html. is derived from the BBC Sport section. It contains 544 documents and each document is split into two related segments as views. The dimension of two views are 3183 and 3203 respectively.

BBC^{3}^{3}3http://mlg.ucd.ie/datasets/segment.html. is derived from the BBC news corporan. It contains 685 documents and each document is split into four related segments as views, which dimensions are 4659, 4633, 4665 and, 4684 respectively.

3Sources^{4}^{4}4http://mlg.ucd.ie/datasets/3sources.html. is a document dataset collected from BBC, Reuters, and The Guardian. It contains 169 documents and these documents belong to six different themes including technology, health, business, politics, entertainment, and sport.

CiteSeer^{5}^{5}5http://ligmembres.imag.fr/grimal/data.html.
is collected from citeseer website. It contains 3312 documents and each document is described by content and citations. 3312 documents can be classified into six classes, including Agents, AI, DB, IR, ML and HCI.

ORL^{6}^{6}6http://www.cl.cam.ac.uk/research/dtg/. is created by the Olivetti Research Laboratory in Cambridge, England. It is a face dataset containing 400 images of 40 different people. For each subject, images are taken at different times, lights, facial expression (open or closed eyes, smiling or not smiling), and facial details (with glasses or not). Each image uses three kinds of features which called intensity feature, LBP feature, and Gabor feature to obtain three views.
IvB Compared Method
We compare our proposed Algorithm 1 with the following methods, including 10 stateoftheart multiview clustering algorithms. Eight algorithms include four matrix decomposition clustering algorithms, Cotraining algorithms and other SOTA multiview clustering algorithms.

Perform kmeans to every view and get the result of each view, then select the best one as the final result. We call the method
BKM. 
AKM is regarded as a baseline method. It concatenates all of the views as one view and performs kmeans to get the final result.

Kernelbased weighted multiview clustering (MVKKM) [25] expresses all views by given kernel matrices. A weighted combination of the kernels is learned in parallel to the partitioning.

A cotraining approach for multiview spectral clustering (Cotrain) [12] has been proposed with a flavor of cotraining. They work on the assumption that the true underlying clustering would assign a point to the same cluster irrespective of the view with no hyperparameters.

Feature extraction via multiview nonnegative matrix factorization with local graph regularization (MultiNMF) [38] is motivated by manifold learning and multiview NMF. The innerview relatedness between data is taken into consideration.

Adaptive Structure Concept Factorization for Multiview Clustering (MVCF) [40] is a method for data integration. This method correlates the affinity weights of all views with the interview correlation.

Selfweighted multiview clustering with soft capped norm (SCaMVC) [9] learns an optimal weight for each view automatically without introducing an additive parameter. It mainly deals with different level noises and outliers by using soft capped norm.

Multiview clustering via deep semiNMF (DMVC) [44] proposes a deep matrix factorization framework for MVC. A graph regularization term is added to a deep NMF framework for preserving the inherent structure of the origin data. It is required that the representation in the last layer of each view is the same.

Autoweighted multiview clustering via deep matrix decomposition (AwDMVC) [10] learns lowerlevel hidden attributes for the subsequent clustering task. The weights of different views are automatically assigned without introducing extra hyperparameters.
ACC  BBCSport  3Source  BBC  CiteSeer  ORL  HW  Average Rank 

BKM  42.93  47.97  45.37  41.43  56.00  81.95  6.33 
AKM  47.97  49.77  40.36  46.02  58.25  64.90  6.50 
RMKMC  45.93  35.39  33.82  22.77  24.50  66.10  8.83 
MVKKM  40.45  46.39  44.92  23.75  62.50  61.90  8.17 
Cotrain  39.18  33.15  32.71  26.44  72.50  80.15  7.67 
MultiNMF  57.51  50.28  48.26  40.22  23.75  78.54  5.83 
MVCF  63.24  58.21  65.75  47.21  66.50  76.75  3.17 
ScaMVC  43.67  54.23  51.95  23.47  61.75  75.20  6.33 
DMVC  43.81  44.21  49.48  24.83  77.00  38.70  6.83 
AwDMVC  70.76  55.86  62.34  48.25      5.33 
Ours  91.73  70.41  71.68  53.86  79.25  84.75  1.00 
NMI  BBCSport  3Source  BBC  CiteSeer  ORL  HW  Average Rank 

BKM  20.75  25.73  27.56  16.41  74.44  75.89  6.33 
AKM  27.64  30.58  22.06  20.21  77.22  62.23  6.33 
RMKMC  24.27  15.27  12.11  11.43  56.89  76.57  8.17 
MVKKM  19.09  26.71  20.96  11.85  77.97  65.64  8.17 
Cotrain  16.48  10.41  10.94  12.25  86.61  76.59  7.67 
MultiNMF  37.96  42.47  27.37  20.10  37.98  74.64  5.50 
MVCF  40.45  48.16  42.80  21.10  83.90  68.74  3.67 
ScaMVC  20.36  31.57  20.18  12.29  78.92  75.64  6.67 
DMVC  26.04  33.35  20.16  13.01  88.00  38.65  6.50 
AwDMVC  46.82  46.09  42.82  22.01      5.17 
Ours  79.38  59.53  45.60  25.38  90.75  74.11  1.83 
PUR  BBCSport  3Source  BBC  CiteSeer  ORL  HW  Average Rank 

BKM  44.84  51.47  46.47  42.59  62.00  81.95  6.33 
AKM  49.36  56.32  40.63  45.70  63.00  68.30  6.83 
RMKMC  45.94  36.64  33.82  21.76  24.50  70.05  9.00 
MVKKM  37.61  49.88  46.35  24.53  68.50  65.50  8.50 
Cotrain  43.68  34.92  33.15  28.64  76.68  80.92  7.50 
MultiNMF  59.23  63.05  48.25  41.92  23.75  79.81  5.33 
MVCF  63.42  60.05  65.84  48.77  70.25  76.80  3.83 
ScaMVC  44.26  60.33  52.56  23.96  66.00  75.20  6.67 
DMVC  51.36  62.80  48.38  28.14  79.75  38.60  5.50 
AwDMVC  65.99  62.45  63.80  50.00      5.50 
Ours  91.73  80.47  72.12  55.98  82.25  84.90  1.00 
IvC Experiment Setup
For the proposed mehtod, all original feature matrices should be normalized firstly. We set the number of clusters is the true number of classes for each dataset. The tradeoff parameters is selected from . We assume that the layer size should be correlated with the number of clusters, so we design two schemes with one layer size and another layer size . Where in are chosen from and respectively and in are chosen from and respectively. The reason why the third layer of is fixed to will be explained in the subsection LABEL:last_layer. For these compared methods, we obtain their paper and code from the autors’ websites and obey the setting of the hyperparameters in the paper.
The clustering performance is evaluated by three widely used criteria, including clustering accuracy (ACC), normalized mutual information (NMI), and purity (PUR). We repeat each experiment 50 times to avoid the effect of the random initialization and save the best result. All experiments are conducted on a desktop computer with Intel i99900K CPU @ 3.60GHz16 and 64GB RAM, MATLAB 2017a (64bit).
IvD Experiment Results
Table III, Table IV and Table V show the clustering performance which is measured by ACC, NMI, and PUR on six benchmark datasets. The best results of all datasets in all algorithms mark in bold. Based on these tables, we can obtain the following conclusions:

As for Table III which measured by ACC, we can find it seems better to connect all features then do kmeans than perform kmeans on all single views then get the best in most of the time. So using all information of data is always better than a certain aspect. It is prominent that the performance of our proposed method is always best. The ACC of our algorithm exceeds the second best method by , , , , , and on BBCSport, 3Sources, BBC, CiteSeer, ORL and HW respectively. What stands out is the performance of on the image datasets ORL and HW. Both of them just have little increase in ACC and the algorithm of Cotrain exceeds our algorithm by on NMI reported in Table IV. It proves that our algorithm is more suitable for text datasets from another angle.

Comparing with the DMVC, our proposed algorithm has achieved good results on six benchmark datasets and improved clustering performance. And both of them use a deep semiNMF framework. The results show that realtime reconstruction of the global graph instead of a fixed graph structure will learn a better representation for the original data and a global consensus graph.

AwDMVC also uses the framework of deep NMF. It assigns a weight to each view automatically when learning feature representation layer by layer without other hyperparameters being introduced. We outperform AwDMVC on all datasets by a large margin, which shows the merits of combining the representation learning and consensus graph reconstructing.
In addition, we report NMI in Table IV and PUR in Table V and get the same conclusions as Table III.
As a result, we have demonstrated our proposed method is effective compared with other stateoftheart methods by analyzing the above experimental results. We attribute the superiority of our proposed algorithm with two factors: i) Our proposed method is based on a deep matrix decomposition framework, so it is can more likely find the meaningful representation layer by layer. ii) We abandon the original graph structure which results in a bad clustering effect and use the learned new good representation to reconstruct a consistent graph for clustering. iii) We propose a framework that unifies representation learning and consensus graph constructing, so learning representation and reconstructing the graph can mutually promote each other.
IvE Ablation Study
Depth  BBCSport  3Sources  BBC  CiteSeer  ORL  HW 

75.55  61.54  54.74  44.81  73.50  64.65  
88.50  64.50  63.50  53.83  77.25  59.50  
91.73  70.41  71.68  53.86  79.25  84.05 
We have done a set of ablation experiments like the number of layers is one, two, and three. The purpose is to verify if the number of layers deeper, the hidden information can be easier to be extracted, and the more valuable representations can be learned.
We record the best parameters like when depth is third. As can see in Table VI, we compare the results of different depths like , , . It is obvious to find that the results of three layers are always greater than two, and the results of two layers are always greater than one. It is easy to calculate the performance improvement on BBCSport, 3Sources, BBC, CiteSeer and ORL by , , , and when the number of layers changes from to , and the performance improvement by , , , and when the number of layers changes from to . So it is very necessary to choose an appropriate number of layers for all datasets.
IvF Visualization
We visualize the clustering results of our algorithm and the comparison diagram of our algorithm, DMVC, and AwDMVC in Figure 3 and Figure 4 respectively. In these figures, we represent the samples of the same class as a color. The points of the same color become closer and the points of different colors become further, the better the clustering performance is. It can be seen from Figure 3 that the differences between intraclass structure and interclass are becoming more and more obvious with the increase of the number of iteration on datasets BBCSports, BBC, and HW. It can be seen from Figure 4 that the clustering effect of DMVC is least obvious. For DMVC, some intraclass structures can be seen, while the boundaries between clusters are very vague or even not. AwDMVC clusters into a ring structure for one class, but a ring always contains other classes, which greatly reduces the clustering effect. In contrast, clear clustering structures can be seen in our algorithm on two benchmarks.
IvG Convergence
It is theoretically guaranteed that our algorithm converges to a local minimum. We also conduct experiments to verify that the algorithm is convergent or not. As shown in Figure 5, the objective value curves are plotted in red on datasets 3Source, BBC, and BBCSport. The experimental results prove that our proposed algorithm can decrease monotonically and the iterations less than 150 usually. Thus it experimentally proves the convergence of our algorithm.
IvH Parameter Sensitivity
There are two sets of parameters in our proposed method, i.e., the balance coefficient and the size of layer . Next we will analyze the sensitivity of , the selection rules of the last parameter in and , and the sensitivity of and in .
IvH1 Sensitivity of
Figure 6 shows the influence of ACC result concerning the parameter under the best layer size setting which is obtained in previous experiments. Each small picture in Figure 6 contains two curves, where the red one is ours, the blue one is the second best algorithm. We can find that our algorithm outperforms the second best algorithm in most range of the on most of the benchmarks even if it is a little sensitive to the parameter .
IvH2 Sensitivity of and in
The Figure 8 shows the sensitivity experimental results of and in on ORL, 3Source and CiteSeer. From these figures, we observe that it is relatively stable in most parameter combinations without the abovementioned discipline. Despite slight variation, it outperforms most algorithms in most of the benchmarks.
V Conclusion
In this paper, we propose a novel multiple clustering framework with deep semiNMF, which simultaneously optimizes deep representation learning and consensus graph constructing. In other words, the deep representation can be refined by the global consensus graph and vice versa. Through the multilayer projection and the guidance of a consensus geometric structure that is constrained by a graph, the representation learned can contain more hidden attributes of the original features. Extensive experiments are conducted on six benchmarks, demonstrating the effectiveness of our proposed algorithm by comparing with ten SOTA methods. In the future, we will consider learning a consensus representation with a rotation matrix directly and construct the consensus graph more discriminative.
Acknowledgment
This work was supported by the National Key R&D Program of China 2018YFB1003203 and the National Natural Science Foundation of China (Grant NO.61672528, NO. 61773392, NO. 61872377).
References

[1]
(2013)
Multiview kmeans clustering on big data.
In
TwentyThird International Joint conference on artificial intelligence
, Cited by: item 4. 
[2]
(2015)
Diversityinduced multiview subspace clustering.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 586–594. Cited by: §I.  [3] (2021) Relaxed multiview clustering in latent embedding space. Information Fusion 68, pp. 8 – 21. Cited by: §I.

[4]
(2019)
Jointly learning kernel representation tensor and affinity matrix for multiview clustering
. IEEE Transactions on Multimedia. Cited by: §I.  [5] (2010) Convex and SemiNonnegative Matrix Factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, pp. 45–55. Cited by: §IIA, §IIB, §IIIB2.
 [6] (2010) Multiview video summarization. IEEE Transactions on Multimedia 12 (7), pp. 717–729. Cited by: §I.

[7]
(2006)
Reducing the dimensionality of data with neural networks
. science 313, pp. 504–507. Cited by: §IIIA.  [8] (2020) Robust visual tracking via constrained multikernel correlation filters. IEEE Transactions on Multimedia 22 (11), pp. 2820–2832. Cited by: §I.
 [9] (2018) Selfweighted multiview clustering with soft capped norm. KnowledgeBased Systems 158, pp. 1–8. Cited by: item 8.
 [10] (2020) Autoweighted multiview clustering via deep matrix decomposition. Pattern Recognition 97, pp. 107015. Cited by: §I, item 10.
 [11] (2020) Partition level multiview subspace clustering. Neural Networks 122, pp. 279–288. Cited by: §I.
 [12] (2011) A cotraining approach for multiview spectral clustering. In Proceedings of the 28th international conference on machine learning (ICML11), pp. 393–400. Cited by: §I, item 5.
 [13] (2011) Coregularized multiview spectral clustering. Advances in neural information processing systems 24, pp. 1413–1421. Cited by: §I.
 [14] (2016) Multiple kernel clustering with local kernel alignment maximization. In Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, pp. 1704–1710. Cited by: §I.
 [15] (2020) Multiview spectral clustering with highorder optimal neighborhood laplacian matrix. IEEE Transactions on Knowledge and Data Engineering. Cited by: §I, §I.
 [16] (2013) Multiview clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 252–260. Cited by: §I.
 [17] (2020) Optimal neighborhood multiple kernel clustering with adaptive local kernels. IEEE Transactions on Knowledge and Data Engineering. Cited by: §I.
 [18] (2016) Parameterfree autoweighted multiple graph learning: a framework for multiview clustering and semisupervised classification.. In IJCAI, pp. 1881–1887. Cited by: §I.
 [19] (2020) Anchorbased multiview subspace clustering with diversity regularization. IEEE MultiMedia 27 (4), pp. 91–101. Cited by: §I.
 [20] (2020) Unsupervised multiview clustering by squeezing hybrid knowledge from cross view and each view. IEEE Transactions on Multimedia. Cited by: §I.
 [21] (2019) Adaptive hypergraph embedded semisupervised multilabel image annotation. IEEE Transactions on Multimedia 21 (11), pp. 2837–2849. Cited by: §I.
 [22] (2018) Learning a joint affinity graph for multiview subspace clustering. IEEE Transactions on Multimedia 21 (7), pp. 1724–1736. Cited by: §I.
 [23] (2019) Learning a Joint Affinity Graph for Multiview Subspace Clustering. IEEE Transactions on Multimedia 21, pp. 1724–1736. Cited by: §I.
 [24] (2014) A deep seminmf model for learning hidden representations. In International Conference on Machine Learning, pp. 1692–1700. Cited by: §I, §IIB.
 [25] (2012) Kernelbased weighted multiview clustering. In 2012 IEEE 12th international conference on data mining, pp. 675–684. Cited by: item 3.
 [26] (2008) Visualizing data using tsne.. Journal of machine learning research 9. Cited by: Fig. 3.
 [27] (2020) Learning adaptive neighborhood graph on grassmann manifolds for video/imageset subspace clustering. IEEE Transactions on Multimedia. Cited by: §I.
 [28]
Comments
There are no comments yet.