1 Introduction
The multimodal and heterogeneous data have been increasingly generated in a vast range of fields. Tensors, as a natural way for highdimensional data representation, have attracted more and more attention in many realworld applications. In contrast to many traditional methods that transform the multimodal and heterogeneous data into vectors, tensor representations for highdimensional data can naturally preserve the coherency and consistency in space, and decrease the number of parameters in learning
phan2010tensor. Most of the current methods for tensor decomposition are utilized in machine learning, signal processing, compressed sensing, and so on. Many studies have showed that highdimensional data, such as image and video sequence are actually reside on an underlying structure of nonlinear geometry
cai2007learning; cruceru2020computationally; li2016mr. However. these tensor decomposition based models for dimensionality reduction are implicitly assumed that the processing arguments are global and multilinear; as a result, they usually fail to explore the local and nonlinear structures.To learn the nonlinear geometry of highdimensional data, manifold learning is an alternative approach to maintain the topological structures in dimensionality reduction. It is based on the assumption that highdimensional data resides on a lowdimensional manifold space. Since 2000, the first method of manifold learning of isometric mapping (ISOMAP) that proposed by tenenbaum290global, manifold learning have been extensively explored and researched in many application fields. A great number of algorithms for manifold learning have been proposed as well. For instance, roweis2000nonlinear introduced the locally linear embedding (LLE) to maintain the local linearity of nearest neighbors, by minimizing the reconstruction errors in a lowdimensional space. It generally assumes that the local neighborhood of a point on the manifold can be well approximated by the affine subspace spanned by the neighbors of the point, and finds a lowdimensional embedding of the data based on these affine approximations. However, LLE is very sensitive to the parameters setting of the nearest neighbors. Instead of local linear, Laplacian Eigenmaps (LE) is to preserve the similarity between the nearest neighbors by constructing the similarity matrix of Laplacian graph belkin2003laplacian. LLE is a locally linear and globally nonlinear algorithm, and LE is a global and local nonlinear algorithm. However, most of the aforementioned algorithms have not taken into account the tangent space projection of manifold geometry. It should be noted that the local tangent space alignment (LTSA) is closely analogous to the concept of manifold geometry zhang2004principal, which is projection of each data point onto a local neighborhood of the tangent space and local alignment in such a lowdimensional space. Later on, its variants of locally nonlinear alignment zhang2008patch, as well as many other methods, such as manifold regularization nonnegative tensor decomposition li2016mr
have gained great attention in feature extraction and clustering. Nonnegative preservation can give an enhancement on the physical interpretation of the nonnegative attributes of the original data.
In this paper, we incorporate hypergraph into nonnegative tensor factorization to preserve the highorder correlations and nonnegative attributes in tensor decomposition. This proposed method is called Hypergraph Regularized Nonnegative Tensor Factorization (HyperNTF). The main contributions of this paper include the following aspects.

HyperNTF incorporates hypergraph into nonnegative tensor factorization (NTF), and uses the last mode of the factor matrix for lowdimensional representation. In this way, it can largely reduce to the storage space and computation complexity.

Hypergraph can effectively unfold the smooth curved manifolds from a 3dimensional space to a lowdimensional one. Our experimental results from multiple synthetic data (e.g. Punctured Sphere, Gaussian surface, Twin Peak, and Toroidal Helix) demonstrate that hypergraph uncovers the local neighborhoods of nonlinear geometry in dimensionality reduction.

HyperNTF achieved stateofthe art (SOTA) performance in clustering analysis. The numerical results from multiple real datasets (e.g.
COIL20, ETH801, face94 male, MNIST, USPS, Olivetti) suggest that HyperNTF reliably achieves better performance regardless of the cluster number.
The paper is organized as follows. We review some of related work in nonnegative tensor decomposition and hypergraph in Section 2. The notations and basic operations including tensor decomposition and hypergraph are presented in Section 3. The algorithmic model of HyperNTF and solutions to the cost function are developed in Section 4. In Section 5, numerical comparisons with stateoftheart algorithms on various synthetic and realworld benchmarks suggest a superior performance of HyperNTF. Section 6 summarizes our work and provides some prospects for the future work.
2 Related Work
In this section we briefly review the methods used in the nonnegative tensor decomposition or factorization, and introduce the hypergraph for the purpose of dimensionality reduction.
The traditional methods usually vectorize the highdimensional data before implementing the learning objective. They suffer from high risks of breaking the natural structures and correlations in the highdimensional data, and fail to encode the higherorder nonlinearity in local neighborhoods. To solve this issue, a great number of algorithms using tensors to represent data, rather than using vectors or matrices, have been proposed. A canonical method is the socalled higher order singular value decomposition (HOSVD). It is an extension of matrix’s SVD to higherorder arrays and widely used in subspace learning
de2000best. One application of HOSVD is to cluster of handwritten digits savas2007handwritten. However, since the orthogonal constraint of HOSVD cannot provide unique solution, it is necessary to impose many other constraints to meet the physical interpretations and to employ the prior knowledge, such as sparsity, smoothness, lowrank property and so on.A popular application of the constrained decomposition is nonnegative tensor decomposition cichocki2007nonnegative
which aims to handle nonnegative data, such as spectrum, probability and energy. The family of nonnegative tensor decomposition includes the nonnegative tensor factorization
cichocki2007nonnegative, nonnegative Tucker decomposition (NTD) kim2007nonnegative, manifold regularization NTD (MRNTD) li2016mr and some other variants.It is an increasing trend to take the heterogeneity and multimodality of data into accounts to construct the learning objective. Following such concept, sun2015heterogeneous proposed the heterogeneous tensor decomposition (HTDMultinomial). HTDMultinomial utilizes the constraint space of learning objective to convert the constrained optimization problem to an unconstrained one; then it can be solved by Riemannian manifold optimization using the secondorder geometry of trustregion method. However, it has two major limitations: 1) HTDMultinomial in cluster analysis requires to know the number of clusters before implementing the learning objective; 2) it suffers from tiny step updates and consequentially slow convergence speed, when the learning objective is sparse and singular.
Motivated by HTDMultinomial, a recent work of lowrank regularized heterogenous tensor decomposition (LRRHTD) for subspace clustering has been proposed zhang2017low. It assumes that the last mode of the factor matrix is lowrank and other modes of projection matrices are orthogonal. Even though it is suitable for the highdimensional data with lowrank property, its lowdimensional representation of the factor matrix is a square matrix whose size is the sample size, resulting in particularly high costs of storage and computation.
As the aforementioned methods, most of existing approaches do not taken into account of higherorder geometry in tensor decomposition. Hypergraph, as a natural extension of 2order graph to the highorder relations, has great potential to uncover the local geometry via learning a local group information in tensor decomposition hu2014eigenvectors; hu2015laplacian. Many studies have shown that compared to the traditional 2order graph, hypergraph provides better performance in both clustering zhou2007learning and classification sun2008hypergraph, especially for the data with complex relations, such as the gene expression tian2009hypergraph, social networks yang2017hypergraph
, neuronal networks
feng2019hypergraph. These previous findings motivate us to incorporate hypergraph into the tensor factorization in this paper.3 Notations and Basic Operations
This section reviews the basic notations and operations used in tensor decomposition. The terminology used in this paper keeps consistency with previous literatures as much as possible cohen2016environmental; de2000multilinear; zafeiriou2010nonnegative; zhang2008patch.
Tensor is regarded as a multiindex numerical array. The order of a tensor is the number of its dimensions or modes, which may include space, time, frequency, subject and class. In this paper, matrix is indicated by the bold case letters, e.g. , while tensors with more than two dimensions are indicated by Lucida calligraphy, e.g., . The elementwise division of two samesized matrices and is denoted by , and the elementwise product or Hadamard product of two samesized matrices and is indicated by . The transpose of a matrix is denoted by . If the entries of a tensor are arranged as a matrix, it is termed as the tensor mode unfolding cichocki2015tensor. The inner product of two samesized tensors and is the sum of their entries, that is denoted by . The norm of a tensor is defined by . Table 1 lists the fundamental symbols defined in this paper.
tensor, matrix, vector, scalar  

unfolding  
core tensor  
,  factor matrices 
tensormatrix product  
Kronecker product  
KhatriRao product  
Hadamard product  
elementwise division  
inner product  
transpose 
3.1 Tensor Decomposition
Dimensionality reduction refers to transform a highdimensional data into a lowdimensional equivalent representation while maximally maintaining the underlying structures, such as the nonnegativity and topological structures, and any other attributes. Tensor decomposition is widely used in dimensionality reduction. Two most prominent approaches for tensor decomposition are tensor Tucker and Canonical Polyadic (CP).
For a given tensor , Tucker decomposition can be expressed as a form of core tensor multiplied by a series of projection matrices:
(1) 
where is the socalled core tensor, and , are the projection matrices.
Note that the generic Tucker decomposition is not unique, and the projection matrices are not restricted to be columnwise orthogonal. In order to obtain an unique decomposition, it is necessary to impose additional constraints, such as orthogonality, smoothness, sparsity, or lowrank property on the projection matrices.
Tucker decomposition can also be expressed as an equivalent formulation of matrix’s unfolding, that reads by set of Kronecker products of the projection matrices,
(2) 
In the context that all the projection matrices are constrained to be orthonormal, i.e., , , Tucker decomposition turns into the wellknown higher order orthogonal iteration (HOOI) de2000best
. More restrictively, if we constrain the projection matrices to be all columnwise orthogonal and the core tensor to be orthogonal as well, Tucker decomposition becomes the higher order singular value decomposition (HOSVD)
de2000multilinear.An important special case of Tucker decomposition is the CP factorization, in which the nonzero entries of the core tensor only lie in the supperdiagonal positions, shown as following
(3) 
If is a 3order superdiagonal tensor, then
(4) 
where is actually an identity diagonal tensor of size that contains the nonzero elements of unit one, and
is an identity matrix. The
operator means to transform a matrix into a tensor along the mode. Then, the equivalent formulation of tensor mode unfolding can be formulated as follows(5)  
wherein the shorthand notation is to simplify the presentation, standing for a series of KhatriRao products in all modes except mode.
Notably, CP and Tucker decomposition are used in different situations. Specifically, CP factorizes data into the sum of rankone tensors that usually has an interpretable meaning of components, and Tucker decomposition compresses a highdimensional data into a smaller core tensor with lowdimensional representation. Thus, CP is often used for factor analysis, whereas Tucker decomposition is usually utilized for the subspace learning.
3.2 Hypergraph
The main distinction between hypergraph and normal 2order graph is that the hypergraph uses a subset of the vertices as an edge and the edge of hypergraph connects more than two vertices (called hyperedge), whereas the edge of a 2order graph only connects two vertices tian2009hypergraph.
Assuming is a weighted hypergraph, then is the set of vertices of the hypergraph, which is a finite set of objects. is the set of hyperedges of the hypergraph, and each hyperedge is a subset of . Then, the relationship of vertices and hyperedges can be expressed by an indicator matrix . Specifically, the elementwise indicator of matrix is defined as follows
(6) 
wherein, a vertex and a hypergraph is called an incident if . Figure 1 is an example of a hypergraph.
The degree of hyperedge , namely , is the number of vertices incident with ,
(7) 
The degree of each vertex is the sum of the weights defined for the hyperedges incident with ,
(8) 
The weight associated with each hyperedge is defined by
(9) 
where is the 2order combination of number ; the parameter is the scaling factor; and are the collection of edges. Let be the lowdimensional representation of sample labels to be learnt. The cost function is defined to minimize the discrepancy measure between the sample labels of the same class, as following
(10)  
where and are diagonal matrices, and . Eventually, is the socalled hypergraph.
4 Hypergraph Regularized Nonnegative Tensor Factorization (HyperNTF)
4.1 The cost function of HyperNTF
Here we propose a hypergraph regularized NTF model, called HyperNTF, to factorize the nonnegative tensors. In general, CP factorization of sparse tensors is essential for dimensionality reduction of largescale data. Specifically, for a given nonnegative tensor data , , the corresponding CP factorization is approximated by an identity diagonal tensor multiplied by a chain of factor matrices on each mode. Thus, our goal is to use the hypergraph to learn the last mode of the factor matrix, i.e., as the lowdimensional representation of highdimensional data. In other words, hypergraph is incorporated into the general framework of nonnegative tensor factorization to minimize the following cost function,
(11)  
in which is a row vector of all ones; is an identity diagonal tensor; is the hypergraph that used to characterize the locally geometrical structures of tensor data; is a tradeoff parameter to prevent overfitting. In practice, the choice of is determined by attributes of the specific dataset.
4.2 The learning algorithm for HyperNTF
To solve the learning objective in Eq.(11), we adopt the method of Multiplicative Updating Rules (MUR) lee2001algorithms; lee1999learning; li2016mr.
First, we reformulate Eq.(11) as follows
(12)  
Then, by using matrix’s properties, the first term of Eq.(12) can be extended as follows
(13)  
We introduce the Lagrange multipliers , corresponding to the constraint of each factor matrix, i.e., , , and the cost function of Eq.(12) becomes the following
(14)  
This relaxed problem as above can be solved in a fashion of alternative updating , . To this end, we have to derive the partial derivative of Eq.(14) with respect to . The first term of Eq.(14) is a constant that can be omitted. Then the second term of Eq.(14) can be computed as following
(15) 
Sequentially, the third term of Eq.(14) is computed as follows
(16) 
Note that, .
Therefore, by using the matrix properties, i.e., and petersen2012matrix, we can obtain the partial derivative of Eq.(14) with respect to as follows
(17) 
Moreover, by taking into account of the KKT conditions, i.e., , we can derive the solution of Eq.(17) as following
(18) 
After a few steps of computations, the learning rules relative to the factor matrix is given by
(19) 
For the calculation of Eq.(19), the KhatriRao product of the involved results in a matrix of size , and can get very costly in terms of computation and memory requirements when and are very large. To address such problem, we adopt an useful approach, named matricized tensortimes KhatriRao product (MTTKRP) kaya2017high, is used for computing the mode vector multiplication,
(20) 
Clearly, the solution of is computed column by column, which can efficiently reduce the computational and storage consumption, resulting in the cost of computation is the product of the tensor with vectors times.
Eventually, we derive the partial derivative of Eq. (14) with respect to . There only needs to take into account of the regularization term of Eq. (14), thus we obtain the following equation
(21)  
To update , there also needs to formulate the iterative formula into the form of tensor mode multiplication of a chain of vectors. Or equivalently, which is formulated as the matrix form as following
(22) 
Till now, all the derivations used in the iterative updates with respect to the factor matrices , and are completed. By using Eq. (19) and Eq. (22) to start from an initialization of factor matrices to update each variable until the termination criteria is met. After all matrices , and are updated, the maximum number of iterations is measured to check convergence at the end of each iteration. Given the pseudocode of HyperNTF in Algorithm 1.
5 Experiments
We conduct two types of tests to validate the performance of hypergraph for topological preservation and HyperNTF for clustering, namely the manifold unfolding test and the clustering test, each consisting of a list of numerical experiments.
In the the manifold unfolding test, we apply hypergraph to unfold some common spheres and then visualize the unfolded manifold. In the cluster test, we compare HyperNTF with six existing methods, including higher order singular value decomposition (HOSVD) de2000multilinear; savas2007handwritten, nonnegative Tucker decomposition (NTD) kim2007nonnegative, nonnegative tensor factorization (NTF) cichocki2007nonnegative, heterogenous tensor decomposition (HTDMultinomial) sun2015heterogeneous, lowrank regularized heterogeneous tensor decomposition (LRRHTD) zhang2017low, and graphLaplacian Tucker decomposition (GLTD) jiang2018image.
All the numerical experiments are conducted on a desktop with an Intel Core i55200U CPU at 2.20GHz and with RAM of 8.00 GB, and repeated 10 times, with randomly selected images in each time.
5.1 Manifold unfolding test
The manifold unfolding test is based on the simulated data qiao2012explicit. We firstly embedded the simulated manifolds (i.e. Punctured Sphere, Gaussian surface, Twin Peak, and Toroidal Helix) in the threedimensional ambient space, using Matlab Demo (mani.m). On each manifold, 1000 data samples are randomly generated for training, the numbers of nearest neighbor are associated with Punctured Sphere, Gaussian surface, Twin Peaks, and Toroidal Helix, respectively. The polynomial degree is set to .
Then we applied LE, LLE and Hypergraph methods to unfold these 3D data. We visualize the local geometry in the manifold unfolding test. The unfolding results of Punctured Sphere, Gaussian surface, Twin Peaks, and Toroidal Helix were shown in Figure 25, respectively.
As shown in Figure 2, Hypergraph can preserve the topological structures of Punctured Sphere in dimensionality reduction. Compared with Hypergraph, even though the contour obtained by LLE is generally preserved, the centre parts of the data points are very sparse, suggesting LLE loses certain important information. More severely, LE almost fails to uncover the highdimensional structure in a lowdimensional space. Therefore, if only an appropriate parameter is chosen, Hypergraph superior to LE and LLE in unfolding the Punctured Sphere. Likewise, the unfolding Gaussian surface with Hypergraph, LLE and LE show similar performance in Figure 3.
As shown in Figure 4, althought all the three methods considerably recover the symmetric structure in a lowdimensional space, Hypergraph maximally reserve the information in dimensionality reduction. LLE has worst performance for unfolding the Twin Peaks.
As shown in Figure 5, Hypergraph, as well as LE can obtain a closely same performance for the dimensionality reduction of Toroidal Helix, whereas the contour obtained by LLE has been destorted.
To summarize, with an appropriate selection of the
neighbors, Hypergraph can effectively uncover the nonlinear geometry of nearest neighborhoods in highdimensional data. All the results consistently suggest that Hypergraph is superior to LE and LLE in unraveling the higherorder correlations and recovering the underlying structures of two degrees of freedom. Therefore, hypergraph is an effective approach for dimensionality reduction.
5.2 Datasets for clustering test
Table 2 presents the general description of the datasets used in clustering test. Specifically, six image datasets (i.e. COIL20, Faces94 male, ETH80, MNIST Digits, Olivetti Faces, and USPS) are involved. The data are randomly shuffled, and the gray value of pixels are normalized to unit. refers to the raw image size, while the indicates the size of the dataset after dimensionality reduction.
The COIL20 dataset contains 1420 grayscale images of 20 objects viewed from 72 equally spaced orientations. The images contain pixels. The Faces94 male dataset consists of 113 male individuals. The original image resolution is , each image was resized to be pixels, for a total number of 600 images. The ETH80 is a multiview image dataset for object categorization, which includes eight categories that include eight categories corresponding to apple, car, cow, cup, dog, horse, pear and tomato. Each category contains ten objects, and each object is represented 41 images of different views. The original image resolution is , we resized each image to be pixels, for a total of 3280 images. The MNIST dataset contains 60000 grayscale images of handwritten digits. For our experiments, we randomly selected 3000 of the images for computational reasons. The digit images have pixels. The Olivetti faces dataset consists of images of 40 individuals with small variations in viewpoint, large variations in expression, and occasional addition of glasses. The dataset consists of 400 images (10 per individual) of size pixels, and is labeled according to identity. The USPS is handwritten digits dataset, it contains a total of 2000 images of size pixels.
Each dataset used in our clustering test has the groundtruth class labels. For evaluation, we first reduce the dimension of tensor data, and then cluster them with the algorithm. We used 3order tensor to execute our numerical experiments. Empirically, the first two modes are associated with image pixels, and the last mode denotes the number of image data.
dataset  #samples  size  size  #classes 

COIL20  1440  32*32  1440*32  20 
ETH801  328  32*32  328*32  8 
face94 male  600  50*45  600*45  30 
MNIST  3000  28*28  3000*28  10 
USPS  2000  16*16  2000*16  10 
Olivetti  400  64*64  400*32  40 
5.3 Experimental results of clustering
Here we present the numerical results on the cluster analysis. Since HyperNTF involves with two essential parameters, the regularization parameter and neighbors, we test to what extent the performance is relying on the selection of and neighbors. We vary from to , and the corresponding from to , respectively. To this end, we firstly perform clustering using algorithms. We run clustering with random initialization ,
10 times and compute the averaged results as the final clustering results. We use clustering accuracy (ACC), and normalized mutual information (NMI) as the evaluation metrics. The results in
Figure 6 suggest that the performance of HyperNTF is robust with the selection of and neighbors.According to Figure 6, for the following analyses, we set the parameters , for COIL20; , for Faces94 male; , for ETH80; , for MNIST ; , for Olivetti; and , for USPS dataset, respectively.
Table 38 compares the clustering results from HyperNTF, the traditional nonnegative Tucker decomposition (NTD) kim2007nonnegative and nonnegative tensor factorization (NTF) cichocki2007nonnegative methods. These results suggest that HyperNTF reliably cluster the data into the labeled classes regardless of the selection of cluster numbers , compared to NTD and NTF.
Table 9 lists the comparisons between some startoftheart (SOTA) methods, including lowrank regularized heterogeneous tensor decomposition (LRRHTD) zhang2017low, heterogenous tensor decomposition (HTDMultinomial) sun2015heterogeneous, higher order singular value decomposition (HOSVD) de2000multilinear; savas2007handwritten, and graphLaplacian Tucker decomposition (GLTD) jiang2018image. The results show that HyperNTF outperform the SOTA methods in most datasets, except Face94 male.
HyperNTF  NTF  NTD  
ACC  NMI  ACC  NMI  ACC  NMI  
2  1±0  1±0  1±0  1±0  1±0  1±0 
4  1±0  1±0  0.9733±0.0287  0.9492±0.0540  0.9257±0.0034  0.8741±0.0035 
6  1±0  1±0  0.9942±0.0183  0.9928±0.0226  0.9407±0.0346  0.9280±0.0256 
8  0.9656±0.0725  0.9825±0.0370  0.8828±0.0454  0.8975±0.0329  0.8286±0.0670  0.8434±0.0352 
10  0.8962±0.0576  0.9381±0.0285  0.7851±0.0438  0.8035±0.0279  0.7454±0.0350  0.7804±0.0167 
12  0.8808±0.0534  0.9483±0.0213  0.7729±0.0477  0.8124±0.0230  0.7436±0.0447  0.7948±0.0239 
14  0.8354±0.0514  0.9144±0.0236  0.7554±0.0429  0.8163±0.0207  0.6849±0.0324  0.7884±0.0179 
16  0.8000±0.0252  0.8680±0.0125  0.7067±0.0481  0.7775±0.0214  0.7035±0.0346  0.7786±0.0174 
18  0.7374±0.0457  0.8505±0.0211  0.6383±0.0280  0.7381±0.0189  0.6392±0.0187  0.7347±0.0076 
20  0.7349±0.0334  0.8488±0.0124  0.6479±0.0365  0.7532±0.0187  0.6322±0.0178  0.7443±0.0094 
HyperNTF  NTF  NTD  
ACC  NMI  ACC  NMI  ACC  NMI  
10  0.8785±0.0453  0.9458±0.0190  0.8940±0.0593  0.9472±0.0291  0.8785±0.0433  0.9358±0.0183 
20  0.8543±0.0245  0.9474±0.0134  0.8090±0.0460  0.9215±0.0185  0.7975±0.0379  0.9088±0.0293 
30  0.8048±0.0490  0.9314±0.0170  0.8075±0.0426  0.9236±0.0166  0.7450±0.0355  0.8921±0.0102 
HyperNTF  NTF  NTD  
ACC  NMI  ACC  NMI  ACC  NMI  
2  1±0  1±0  0.8756±0.1261  0.6347±0.3229  0.9817±0.0132  0.8905±0.0566 
4  0.6049±0.0231  0.5372±0.0438  0.6634±0.0075  0.6171±0.0196  0.6707±0.0655  0.6254±0.0337 
6  0.6992±0  0.7414±0.0027  0.5634±0.0470  0.5910±0.0353  0.6756±0.0801  0.6460±0.0291 
8  0.6378±0.0329  0.7192±0.0148  0.5204±0.0610  0.5689±0.0286  0.5466±0.0343  0.5783±0.0226 
HyperNTF  NTF  NTD  
ACC  NMI  ACC  NMI  ACC  NMI  
2  0.9983±0.0000  0.9839±0  0.9970±0  0.9733±0.0057  0.9810±0.0046  0.8831±0.0219 
4  0.8403±0.0026  0.6552±0.0046  0.6157±0.0751  0.5033±0.0671  0.6205±0.0616  0.4592±0.0319 
6  0.7604±0.0020  0.6511±0.0024  0.5211±0.0416  0.4245±0.0316  0.6457±0.0154  0.5229±0.0065 
8  0.7781±0.0142  0.6738±0.0095  0.5721±0.0431  0.4809±0.0281  0.5892±0.0452  0.5038±0.0241 
10  0.6400±0.0342  0.5891±0.0081  0.4454±0.0126  0.4150±0.0120  0.4796±0.0110  0.4429±0.0059 
HyperNTF  NTF  NTD  
ACC  NMI  ACC  NMI  ACC  NMI  
10  0.6490±0.0321  0.7104±0.0341  0.6440±0.0510  0.6863±0.0390  0.5930±0.0604  0.6345±0.0379 
20  0.6475±0.0628  0.7758±0.0454  0.5960±0.0350  0.7268±0.0258  0.5435±0.0348  0.6598±0.0280 
30  0.6317±0.0166  0.7768±0.0153  0.5803±0.0352  0.7345±0.0172  0.5497±0.0407  0.7042±0.0159 
40  0.6073±0.0340  0.7689±0.0185  0.5485±0.0294  0.7397±0.0152  0.5140±0.0345  0.6923±0.0197 
HyperNTF  NTF  NTD  
ACC  NMI  ACC  NMI  ACC  NMI  
2  0.9858±0.0087  0.9014±0.0548  0.8845±0.0016  0.5187±0.0071  0.8945±0.0033  0.5712±0.0085 
4  0.7025±0.0196  0.6353±0.0075  0.7015±0.0709  0.4863±0.0265  0.8541±0.0049  0.6503±0.0097 
6  0.7142±0.0243  0.6646±0.0095  0.6337±0.0171  0.4907±0.0202  0.6498±0.0538  0.5249±0.0166 
8  0.6474±0.0182  0.6243±0.0232  0.5714±0.0372  0.4654±0.0258  0.4634±0.0104  0.4197±0.0080 
10  0.5284±0.0224  0.5271±0.0087  0.4597±0.0340  0.4040±0.0284  0.4341±0.0351  0.4317±0.0229 
Method  
Dataset  Metric  HyperNTF  LRRHTD  HTDMultinomial  GLTD  HOSVD 
ACC  0.7349±0.0334  0.6678±0.0200  0.6452±0.0336  0.6107±0.0279  0.6080±0.0200  
COIL20  NMI  0.8488±0.0124  0.7656±0.0137  0.7440±0.0256  0.7311±0.0221  0.7226±0.0161 
ACC  0.8048±0.0490  0.8373±0.0257  0.8053±0.0240  0.7600±0.0402  0.7683±0.0248  
Faces94 male  NMI  0.9314±0.0170  0.9411±0.0095  0.9239±0.0139  0.8996±0.0200  0.9063±0.0156 
ACC  0.6378±0.0329  0.6238±0.0424  0.5570±0.0322  0.4470±0.0080  0.4494±0.0125  
ETH801  NMI  0.7192±0.0148  0.6427±0.0170  0.5676±0.0378  0.4125±0.0091  0.4187±0.0140 
ACC  0.6400±0.0342  0.5165±0.0193  0.5289±0.0245  0.5120±0.0038  0.5115±0.0016  
MNIST  NMI  0.5891±0.0081  0.4699±0.0135  0.4465±0.0155  0.4490±0.0021  0.4499±0.0015 
ACC  0.6073±0.0340  0.4958±0.0198  0.5832±0.0338  0.5900±0.0297  0.5745±0.0375  
Olivetti  NMI  0.7689±0.0185  0.7131±0.0095  0.7577±0.0238  0.7613±0.0151  0.7464±0.0182 
ACC  0.5284±0.0224  0.4412±0.0252  0.4806±0.0237  0.5166±0.0315  0.5070±0.0276  
USPS  NMI  0.5271±0.0087  0.4446±0.0206  0.4453±0.0290  0.4618±0.0164  0.4586±0.0157 
6 Conclusion
To sum up, our proposed algorithm, HyperNTF, achieved stateofthe art (SOTA) performance both in dimensionality reduction and clustering tests. The unfolding the curved manifolds tests obtained reliable embedding space from to dimensions under an optimal choice of value (Figure 25). These results indicated that hypergraph helps to maintain the higherorder correlations among data points. Moreover, in the clustering tests (Table 39), HyperNTF is superior to the compared methods (HTDMultinomial, LRRHTD, GLTD and HOSVD) regardless of the cluster numbers. It thus has a distinct advantage to automatically determine the number of clusters without the necessity to know the class number in advance. Despite these merits, some issues of HyperNTF have to be noted and require further investigations, such as the optimal initialization, the discrepancy measurement, as well as the optimal stopping criterion.
Acknowledgements
This research was supported by the National Natural Science Foundation of China (No.62001205), Guangdong Natural Science Foundation Joint Fund (No.2019A1515111038), Highlevel University Fund (No.G02386301, G02386401).
Comments
There are no comments yet.