1 Introduction
As a general version of pairwise graphs, hypergraph learning is commonly used in computer vision, machine learning and pattern recognition areas, _meaning:NTF . e.g _catcode:NTF a e.g. e.g. lsc ; sum ; phr ; higher ; supervised ; adaptive ; nmi ; he
, since it represents the similarity relation of data via measuring the similarity between groups of points, which is deemed as a fundamental issue in the aforementioned research areas. Recently, many researchers have been keen to develop different hypergraph models for addressing different tasks, and many impressive hypergraph models were proposed. Hypergraph algorithms can be roughly divided into two categories. The first category uses the spectral clustering techniques to partition the vertices via constructing a simple pairwise graph from the original hypergraph. Representative methods include clique expansion
expansion , star expansion expansion and clique averaging mean , _meaning:NTF . etc _catcode:NTF a etc. etc.. The second category from this category defines a hypergaph Laplacian using analogies from the simple pairwise graph Laplacian. Its representative methods include Zhou’s normalized Laplacian zhou , Bolla’s Laplacian bolla , _meaning:NTF . etc _catcode:NTF a etc. etc.. However, interestingly, as was shown in hol , all of the previous algorithms, despite their very different formulations, can be reduced to two graph constructions, the star expansion and the clique expansion, and they are equivalent to each other under specific conditions.There are extensive studies about hypergraph construction, _meaning:NTF . e.g _catcode:NTF a e.g. e.g. mean ; bolla ; rodriguez ; zhou ; expansion . But, to the best of our knowledge, there are no prior works to formally discuss the importance of hyperedge weight to hypergraph learning. In the graph learning, which is the pairwise case of hypergraph, extensive studies have already shown that a good choice of edge weight can significantly improve the graphbased algorithms. The HeatKernel and DotProduct weighting schemes are considered as the two most representative weighting schemes of edges lpi ; gnmf ; lpp ; glpp . Therefore, we argue that the choice of hyperedge weights also should play a crucial role in hypergraph learning. This motivated us to investigate if there exist a representative hyperedge weighting scheme in hypergraph learning. Moreover, we believe that different hyperedge weights actually provide different ways to explain the hypergraph from different perspectives. In this paper, we try to fill such gap and empirically discuss the influence of hyperedge weight to hypergraph learning via presenting and evaluating three novel hyperedge weighting schemes.
As several hypergraph algorithms have been proposed, a few hyperedge weighting schemes have been heuristically mentioned in such papers. For example, Huang _meaning:NTF . et al _catcode:NTF a et al. et al.
phrproposed a probabilistic hypergraphbased image retrieval system. In this system, the hyperedge is generated by
nearest neighbour searching and its weight is the sum of the pairwise edge weights between the centroid (seed point) of hyperedge and its neighbours. Zhang _meaning:NTF . et al _catcode:NTF a et al. et al. nmipresented an unsupervised hypergraphbased feature selection method, which measures the highorder similarity of the vertices in a hyperedge using multidimensional interaction information (MII). For addressing a 3D object retrieval task, Gao _meaning:NTF . et al _catcode:NTF a et al. et al.
sum calculated the hyperedge weight via directly summing the weights of all pairwise edges whose end points are all in the same hyperedge. Clearly, the computation of such hyperedge weight is actually the inverse process of the clique expansion. So, if we use the mean operation to replace the sum operation, such way will be the inverse process of the clique averaging. Different from the previous three methods, Yu _meaning:NTF . et al _catcode:NTF a et al. et al. adaptive defined the hyperedge weight as a parameter of the hypergraph model via imposing a sparsity constraint. Thus, the hyperedge weights can be adaptively learned as the graph model optimized. The initial hyperedge weights of this method are constructed by following Huang’s way phr , and the global optimal weights still cannot be guaranteed. Certainly, there are also some other hyperedge weighting schemes higher ; supervised , but most of them are associated with very specific tasks.Complementary to the previously proposed hyperedge weights, we carefully design three novel hyperedge weights from the perspectives of geometry, multivariate statistical analysis and linear regression sparse ; collabrative (see Figure 1). From the perspective of geometry, a hyperedge can be deemed as a highorder simplex hol . Thus, the volume of simplex (VOLUME) is a natural hyperedge weight, which provides a reasonable dissimilarity measure for a point set. Motivated by some studies from geometry det , we present three ways to compute the volume of simplex for different situations. It is worthwhile to note that these three ways actually define the mathematical relationships between hyperedges and vertices, a hyperedge and its pairwise edges, a hyperedge and its subhyperedges, respectively. From the perspective of data mining and multivariate statistical analysis, the hyperedge can be naturally regarded as a cluster in the sample space, thus the trace of the scatter matrix (TRACE) of the samples in the same hyperedge should be a good hyperedge weight. From the perspective of linear regression sparse ; collabrative , the linear reconstruction error (LLRE) of the homogenous samples should be smaller than the one of the inhomogeneous samples. So, we consider a hyperedge as a small subset of samples, and use the local linear reconstruction error of each point in the hyperedge to measure the similarity of the point set.
In order to verify the importance of hyperedge weighting scheme in hypergraph learning, three stateoftheart hypergraph models including Zhou’s normalized Laplacian zhou , clique expansion and star expansion expansion , are adopted to evaluate the different hyperedge weights for clustering and classification. Several representative hyperedge weighting schemes for classification and clustering are concluded from our experimental results on ORL, COIL20, Sheffield and JAFFE databases. Such experimental results also demonstrate that a carefully chosen hyperedge weight can significantly improve the performance of hypergraph algorithms. Moreover, we simply apply the combinations of the traditional hypergraph model and the learned representative weighting schemes for image classification and clustering on some larger databases, such as Sence15 and Caltech256 databases. The results show that such simple combination can also get very promising performances in comparison with the stateoftheart algorithms.
There are mainly three contributions to our work:

Three novel weighting schemes, include the volume of simplex (VOLUME), the trace of scatter matrix (TRACE) and the local linear reconstruction error (LLRE), are proposed from three different perspectives, with a desirable property of VOLUME is that it gives the definitions of the relationships between the hyperedge and its vertices, its pairwise edges, and its subhypersedges. Extensive experiments show that our proposed weight schemes significantly outperform the conventional weighting schemes in classification and can get a competitive performance in clustering.

We empirically verify the importance of the choice of hyperedge weight on hypergraph learning and draw the researcher’s attention to the importance of the design of hyperedge weight.

Representative hyperedge weighting schemes for classification and clustering are experimentally compared and this is very instructive for the hypergraphbased studies.
The rest of paper is organized as follows: Section 2 presents the background of hypergraph learning; Section 3 describes the proposed hyperedge weights. Section 4 shows how to solve classification and clustering tasks using hypegraph; Section 5 presents the experiments and the conclusion is summarized in Section 6.
2 Background
In this section, in order to analyze of the influences of the hyperedge weight to hypergraph learning, the basic notations of hypergraph and three common hypergraph frameworks will be introduced, including Clique Expansion expansion , Star Expansion expansion and Zhou’s normalized hypergraph zhou . Besides these three hypergraph frameworks, Bolla’s Laplacian bolla , Rodriguez’s Laplacian rodriguez and Clique Averaging mean are also very popular hypergraph frameworks. We didn’t choose them, since Bolla’s Laplacian and Rodriguez’s Laplacian are unweighted hypergraphs and Clique Averaging is deemed as solving the same approximation problem as Clique Expansion.
2.1 Notations
The key difference between the hypergraph and the ordinary graph is that each edge (hyperedge) of hypergraph can connect more than two vertices (see Figure 2). Let denote a hypergraph with vertex set and edge set . The edges are arbitrary subsets of with weight associated with edge . The degree of a vertex is
(1) 
The degree of a hyperedge is denoted by . For uniform hypergraphs, the degrees of each hyperedge are the same, . In particular, for the case of ordinary graphs or 2graphs, . The vertexedge incidence matrix is dimensional binary matrix whose th entry is . If , is 1, otherwise it is . By these definitions, we have
(2) 
and
(3) 
and are the diagonal matrices consisting of edge (hyperedge) and vertex degrees, respectively. is the diagonal matrix of edge weights, .
2.2 Clique Expansion
In the clique expansion algorithm expansion , each hyperedge is expanded to a clique. A pairwise graph is expanded from the original hypegraph using clique expansion. We have and . The edge weight of minimizes the difference between the weight of the graph edge and the weight of each hyperedge that contains both and :
(4) 
The solution of this criterion is
(5) 
where is a fixed scalar. The combinatorial or normalized Laplacian of the constructed graph is then used to partition the vertices.
2.3 Star Expansion
In the star expansion algorithm expansion , a new vertex is introduced for each hyperedge and this new vertex is connected to each vertex in this hyperedge. More specifically, for a hypergraph , the vertex and edge sets of the starexpanded pairwisegraph, denoted as and , are defined as and , respectively. Thus each hyperedge in is expanded into a star in , which is a bipartite graph. The weight of the edge in is given by
(6) 
Since , we can assume that in , all are ordered before . Let denote the weight between the vertices constructed from and the vertices from in . The adjacency matrix for can be obtained readily from . Based on the adjacency matrix, the degree for all vertices can be computed. We use and to denote the diagonal matrices of vertex degrees for vertices in and in the expanded graph , respectively. Finally, the Laplacian of star expansion is formulated as follows,
(7) 
where
is an identity matrix.
2.4 Zhou’s Normalized Laplacian
Zhou’s Normalized Laplacian zhou is a representative method of defining the hypergraph Laplacian using analogies from the graph Laplacian. Following the random walk model, Zhou et al. proposed the following normalized hypergraph Laplacian :
(8) 
In the random walk model, given the current position , the walker first chooses a hyperedge over all hyperedges incident with
, with probability proportional to
, and then chooses a vertex uniformly at random.3 Hyperedge Weight Computation
In this section, we design three novel hyperedge weighting schemes from the perspectives of geometry, multivariate statistical analysis and linear regression. The volume of simplex, scatter and linear reconstruction error are adopted respectively as the similarity measure of the point set.
3.1 Volume of Simplex
From geometry perspective, each hyperedge can be deemed as a simplex hol . Thus, a geometric measure of a set of points can be naturally obtained by computing the volume of the simplex, since a smaller volume of the simplex indicates a closer geometric relationships between the vertices in the hyperedge and vice versa.
There are three ways to compute the volume of the simplex. The first way is to use the vertices of the simplex to compute its volume. Let the vertices of a degree simplex associated with the th degree hyperedge be represented as
dimensional column vectors
. According to Gram Determinant formula det , we cam define a matrix , whose th column vector is . The volume of the simplex can be computed as follows(9) 
where is the matrix determinant and is factorial. This way defines the relationship between the hyperedge weights and its vertices.
The second way is to utilize the edges of the simplex to compute its volumes. This way is very crucial, since it defines the relationship between hyperedge weight and the pairwise edge weights. Let denote the distance between th vertex and th vertex (or to use the pairwise edge weight instead). Then, we can construct
presudoaffinity matrix
as follows(10) 
According to the CayleyMenger Determinant formula det , the volume of simplex associated with the hyperedge is denoted as follows
(11) 
The third way is to use the hyperfaces of simplex to compute the volume of simplex. For a degree simplex, it should have
hyperfaces, where each hyperface is a hyperplane whose Cartesian equation is given by
(12) 
where are variables standing for real numbers and the are real constants. Let be the matrix with elements and be the cofactor matrix of matrix with respect to . Then, according to the KlebanerSudburySatterson Determinant formula det , the volume of simplex can be computed as follows
(13) 
Actually, such cases where only information about hyperfaces is known is very strict, which may seldom happen in practical applications. But we still think this formulation is noteworthy, since it theoretically sets up a link between the subhyperedge and hyperedge. The reason why it can put such link is that a hyperface of the simplex is also a simplex, for example, the 2hyperface of a simplex is a 2simplex (triangle).
After obtaining the simplex volume, the weight of hyperedge which is associated with the simplex is given as follows
(14) 
where is a positive parameter controls the scaling of the hyperedge weight.
The previous formulas held for arbitrary and . But, the dimensions of feature should be equal or greater than the degree of hyperedge , _meaning:NTF . i.e _catcode:NTF a i.e. i.e., ,since the volumes will be degenerated when . However, in computer vision applications, typically anyway, so the degeneracy is unlikely to happen.
3.2 Trace of The Scatter Matrix
From the perspectives of multivariate statistical analysis and data mining, each hyperedge can be considered as a cluster in the sample space. So it is very natural to use the scatter matrix to measure the compactness of a cluster (hyperedge). Therefore, we denote this weight by TRACE. Let the dimensional matrix denote the sample matrix associated with the vertices of a degree hyperedge . Then, the scatter matrix , which is a positive semidefinite matrix, is computed as follows
(15) 
where is the mean of the samples and is a dimensional matrix whose columns are all . Finally, we can compute the weight of hyperedge as the trace of the scatter matrix
(16) 
where denotes the matrix trace, is an elementwise exponential operation and is a positive parameter for controlling the scale of the weight.
3.3 Local Linear Reconstruction Error
We can measure the similarity between a single point and a point set by the linear reconstruction error. The reconstruction error is expected to be smaller if the sample is reconstructed from a homogenous sample rather than an inhomogeneous samples. More specifically, each hyperedge is consider as a subset of the samples. We denote this scheme by LLRE. So, we can follow a leaveoneout strategy to get the reconstruction errors of each sample in such subset via linear regression. In the case of undirected hypergraph, each vertex will get a reconstruction error. We assume dimensional samples are associated with the ordered vertices in degree hyperedge . The reconstruction coefficients of sample , which miminizes the reconstruction error can be solved as a leastsquare problem as follows
(17) 
where is as a dimensional matrix whose th column is where and . The solution of this problem is where is the generalized inverse of matrix. After obtaining , its corresponding reconstruction error can be computed as follows
(18) 
For an undirected hypergraph, the overall reconstruction error of a hyperedge can be flexibly assigned as the mean of the reconstruction errors of the samples , the minimum of the reconstruction errors of the samples or the maximum of the reconstruction errors of the samples . In the case of a directed hypergraph, each directed hyperedge gets one reconstruction error, since the subscript of hypergraph is ordered and only the vertex corresponding to the first subscript of hyperedge is considered as reconstructed point. So, in that case, the overall reconstruction error of hyperedge is directly equal to .When the hyperedge is generated by nearest neighbor searching, the overall reconstruction error of hyperedge can be assigned as the reconstruction error of the samples corresponding to the seed point for saving time. Finally, we use a positive to scale the hyperedge weight with reconstruction error .
(19) 
Some reasonable constraints can be imposed to the coefficient for furtherly optimizing this model. For example, a sparsity constraint may make sense when the degree of hyperedge is extremely high, thus the coefficient can be solved as a sparse representation task sparse ; sgraph or collaborative representation task collabrative .
4 Clustering and Classification
After getting the hyperedge weight, the aforementioned three hypergraph learning frameworks (in section 2) are utilized to learn the corresponding hypergraph Laplacians, which can be used for clustering or classification tasks. According to Zhou’s work zhou , the hypergraphbased clustering and embedding problem is formulated as the following standard Normalized cut (Ncut) problem.
(20) 
where the dimensional matrix is the learned hypergraph Laplacian. According to Zhou’s work,
is an eigenvector of
corresponding to the smallest eigenvalue that should be equal to zero. Clearly, this problem can be solved as an eigenvalue problem and the solution of
ways partition are the eigenvectors of corresponding to the smallest nonzero eigenvalues.With regard to the hypergraphbased classification, the dimensional vector is deemed as a classification function over
, which classifies each vertex
as the sign . On one hand, in order to assign the same labels to vertices which have many incident hyperedges in common, a functional should be defined to minimize the sum of the changes of a function over the hyperedges of the hypergraph. According to Zhou’s work zhou , such functional is exactly as . On the other hand, the initial label assignment should be changed as little as possible. Let dimensional vector be the label function of the th class, where or 1 if the vertex belonging to th class or other classes respectively, and 0 if the vertex is unlabeled. Thus, the hypergraphbased classification can be formulated as the following optimization problem(21) 
where is the class number and matrix is the collection of vector . is the parameter specifying the tradeoff between the two competitive terms. According to Yu’s work adaptive , the solution of this problem is as follows
(22) 
where matrix is a label matrix whose th column is . After obtaining , the classification of th sample can be accomplished by assigning it to the th class that satisfies .
5 Experiments
In order to evaluate the influence of the hyperedge weight choice strategy to hypergraph learning, three classical hypegraphs, namely Zhou’s Normalized Laplacian zhou , Clique Expansion expansion and Star Expansion expansion , are used to address the clustering and classification tasks on six databases: ORLorl , COIL20 coil20 , Sheffield umist , JAFFE jaffe , Scene15 sence15 and Caltech256 caltech256 databases. In these experiments, our proposed hyperedge weighting schemes, as well as another three commonly adopted hyperedge weighting schemes are applied to the previous hypergraph frameworks.
5.1 Data Sets and Experimental Configurations
Six datasets, including ORL, JAFEE, COIL20, Sheffield, Scene15 and Caltech2562000, are used in our experiments and their details are reported in Table 1. Among them, Caltech2562000 dataset adaptive is a subset of caltech 256 caltech256 . We use the first four databases to experimentally study the impact of hyperedge weight to the performance of hypergraph learning, since these four datasets possess manifold structures, and hypergraph learning is a manifold learning technique.
Database Name  Classes  Total Samples  Feature  Dimension  Manifold 

ORL orl  40  400  Grayscale  10304  Pose 
COIL20 coil20  20  1440  Grayscale  1024  View 
JAFFE jaffe  10  213  Grayscale  4096  Expression 
Sheffield umist  20  564  Grayscale  10304  Pose 
Scene15 sence15  15  1500  PiCoDes picodes  2048  unknown 
Caltech2562000 adaptive  20  2000  PiCoDes picodes  2048  unknown 
Three very commonly used hyperedge weighting schemes are implemented to compare with our proposed hyperedge weights. The first hyperedge weighting scheme is 01 weighting scheme zhou ; bolla , which is also commonly adopted in the regular graph case gnmf . The hypergraph with this weighting scheme can be regarded as the unweighted hypegraph, since all hyperedges in this case are equal to 1. We name this weighting scheme BINARY in our experiments. The second hyperedge weight is the sum of the weights of the pairwise edges in it sum ; expansion ,
(23) 
This hyperedge weight computation is the inverse process of Clique Expansion and this weight can be deemed as the perimeter of the simplex. For the convenience of discussion, we shortly name this weight SUM in the experiment section. Another frequently used hyperedge weight is the mean of the weights of the pairwise edges in it,
(24) 
where is the vertex degree of the hyperedge mean . This hyperedge weight computation is the inverse process of Clique Averaging. However, this case is actually equivalent to SUM
. So, we will not adopt it for comparison. The third hyperedge weighting scheme stems from the KNNbased hyperedge generation. In this case, each hyperedge has a seed point, which is also known as the centroid of the neighborhood. The hyperedge weight is the sum of the distances between the centroid and each of its neighbors in a hyperedge,
(25) 
where is the vertex subscript of centroid in hyperedge phr ; adaptive . For convenience, we shortly name it CENTROID. We remind that our proposed hyperedge weights, based on volume of the simplex, trace of the scatter matrix and local linear reconstruction errors, have been respectively renamed as VOLUME, TRACE and LLRE in the introduction.
5.2 Implementation Details
It is impracticable to enumerate all possible hyperedges. For example, for a 400 vertices undirected hypergraph, there are more than eight billion 5degree hyperedges. Therefore, in this paper, we generate hyperedges following Huang’s strategy that each hyperedge is generated by a KNN searching given a vertex phr . For different databases, the is different. We set on ORL database and set on COIL20 database following the choice of in adaptive . With regard to Sheffield database and JAFFE database, we apply twofold cross validation to learn the optimal which was found to be equal to 5. Similarly, we apply twofold cross validation to learn the optimal scaling parameter under the different hypergraph frameworks, and we fixed the classification trade off parameter to 1. With regard to the two larger datasets, Scene15 and Caltech256, we follow the same experimental setting of adaptive , where and in Caltech256 and Scene15 datasets respectively.
Database  Mean Classification Errors Standard deviation (%)  

BINARY  SUM  CENTROID  VOLUME  TRACE  LLRE  
ORL  7.751.77  6.251.06  7.751.77  5.501.41  4.501.41  7.251.77 
JAFEE  15.003.93  12.221.57  15.002.36  12.780.79  13.330.00  13.892.36 
Sheffield  18.790.73  15.341.71  17.593.90  8.793.66  11.212.68  18.790.70 
COIL20  5.350.88  4.860.00  7.222.55  3.680.69  4.790.29  7.854.03 
Average  11.72  9.67  11.89  7.69  8.46  11.94 
Database  Mean Classification Errors Standard deviation (%)  

BINARY  SUM  CENTROID  VOLUME  TRACE  LLRE  
ORL  9.750.35  7.750.35  9.750.35  6.750.35  5.750.35  9.750.35 
JAFEE  16.115.50  14.441.57  15.003.93  15.000.79  14.441.57  16.115.5 
Sheffield  20.001.95  15.522.44  18.972.44  8.973.90  11.212.19  20.001.95 
COIL20  5.420.98  5.560.00  8.062.16  4.100.29  4.720.59  8.753.34 
Average  12.82  10.82  12.94  8.70  9.03  13.65 
Database  Mean Classification Errors Standard deviation (%)  

BINARY  SUM  CENTROID  VOLUME  TRACE  LLRE  
ORL  8.001.41  6.250.35  8.001.41  5.501.41  4.751.06  7.251.77 
JAFEE  14.443.14  13.330.00  14.443.14  12.780.79  13.890.79  14.443.14 
Sheffield  19.480.24  16.211.46  19.480.24  10.173.66  11.382.93  19.480.24 
COIL20  5.491.08  5.280.59  7.362.75  3.890.98  5.000.59  8.064.32 
Average  11.85  10.27  12.32  8.08  8.75  12.31 
5.3 Evaluation in Classification
We conduct some experiments to study the influence of hyperedge weight to the hypergraph in classification. The cross validation scheme is applied in these experiments.
Tables 2, 3 and 4 respectively report the classification errors of Zhou’s Normalized Laplacian, Clique Expansion and Star Expansion frameworks using different hyperedge weights in twofold cross validation case. Figure 3 presents the comprehensive evaluation results of the six weighting schemes under three hypergraph frameworks. The Yaxis of this Figure indicates the mean classification errors of four databases. According to Tables 2, 3, 4 and Figure 3, it is clear that the proposed weighting schemes, VOLUME and TRACE, outperforms other four weighting schemes. For example, on Sheffield database, the classification accuracy gains of VOLUME over BINARY, SUM, CENTROID are 11.03%, 6.45% and 10% respectively using Clique Expansion. Such gains for the TRACE are 8.79%, 4.31% and 7.76%. From comprehensive perspective, the average classification accuracy gains of VOLUME over the frequently adopted weighting scheme CENTROID are 4.2%, 4.24% and 4.24% using Zhou’s Normalized Hypergraph, Clique Expansion and Star Expansion respectively. These numbers of TRACE are 3.43%, 3.91% and 3.47%.
Moreover, several experiments are conducted for studying the influences of the choices of the hypergraph framework versus the choice of hyperedge weighting scheme to the classification performances. To measure the impact of the choice of the hypergraph framework, we measure the classification accuracy improvement of the best framework choice over the worst choice. We use the same strategy to measure the impact of the choice of the hyperedge weighting scheme, and the choice of their combination to the classification performance. Figure 4 reports the results of these experiments. The results demonstrate that the classification performance is benefited much more from a good choice of hyperedge weight than a good choice of hypergraph framework in all experiments. In the most of cases, the positive impact from a good hyperedge weight is five times even ten times of the positive impact from a good hypergraph framework. This phenomenon reveals the importance of hyperedge weight in hypergraphbased classification.
5.4 Evaluation in Clustering
In this section, we report several experiments that are conducted for studying the influence of hyperedge weight to a clustering task. At first, we apply different hypergraph algorithms to learn the embedding of data. After that, kmeans is adopted to predict the label of samples based on the embedding results, and the number of clusters is fixed to the number of classes. The clustering result is evaluated by comparing the predicted label of each sample with the label provided by the data set. Two metrics, the accuracy (AC) and the normalized mutual information metric (NMI) are used to measure the clustering performance. Please see
lpi for the detailed definitions of these two metrics.Database  Accuracy (Mutual Information Metric, %)  

BINARY  SUM  CENTROID  TRACE  VOLUME  LLRE  
ORL  66.75(82.62)  66.75(81.57)  69.50(81.62)  68.50(81.62)  69.50(81.62)  67.50(79.96) 
JAFEE  59.44(65.86)  65.00(65.20)  65.00(73.36)  64.44(72.10)  58.33(67.72)  58.33(67.72) 
Sheffield  64.35(77.77)  65.04(75.82)  64.87(75.82)  64.87(75.82)  64.87(75.82)  64.87(75.82) 
COIL20  60.42(74.30)  76.60(85.26)  76.60(85.26)  76.60(85.26)  76.46(85.58)  79.86(88.02) 
Average  62.74(75.16)  68.35(76.96)  68.99(79.01)  68.60(78.7)  67.29(77.69)  67.64(77.88) 
Database  Accuracy (Mutual Information Metric, %)  

BINARY  SUM  CENTROID  TRACE  VOLUME  LLRE  
ORL  65.25(82.20)  67.50(80.98)  65.75(82.9)  70.50(82.75)  65.00(82.44)  65.00(80.76) 
JAFEE  71.11(70.16)  70.00(71.62)  69.44(68.71)  71.67(72.87)  74.44(76.27)  71.11(69.26) 
Sheffield  61.91(78.13)  66.09(80.47)  66.09(80.47)  66.09(80.47)  66.09(80.47)  66.09(80.69) 
COIL20  69.72(79.82)  82.29(89.12)  79.31(89.00)  82.29(89.12)  82.29(89.12)  82.29(89.12) 
Average  67.00(77.56)  71.47(80.55)  70.15(80.27)  72.64(81.30)  71.96(82.08)  71.12(79.96) 
Database  Accuracy (Mutual Information Metric, %)  

BINARY  SUM  CENTROID  TRACE  VOLUME  LLRE  
ORL  63.75(79.01)  65.75(79.48)  65.75(79.48)  67.00(80.48)  65.75(79.48)  68.25(80.52) 
JAFEE  61.11(67.51)  63.33(70.23)  61.11(67.51)  65.56(70.54)  61.67(68.16)  63.89(69.23) 
Sheffield  58.43(72.7)  62.96(75.10)  62.43(77.62)  62.26(73.73)  58.26(74.28)  58.26(73.89) 
COIL20  49.65(64.67)  62.36(73.15)  62.43(73.86)  56.46(71.29)  59.03(72.53)  62.99(73.62) 
Average  58.23(70.97)  63.60(74.49)  62.93(74.62)  62.82(74.01)  61.18(73.61)  63.35(74.31) 
Tables 5, 6 and 7 respectively report the clustering results of Zhou’s Normalized Laplacian, Clique Expansion and Star Expansion frameworks using different hyperedge weights on four databases. From the experimental results, it seems that different weighting schemes performs well on different databases. From the comprehensive evaluation based on the mean accuracy and mean mutual information metric of four databases, CENTRIOD and TRACE slightly performs better than VOLUME, SUM and LLRE. However, another interesting phenomenon is that all five weighting schemes significantly outperform the unweighted case, BINARY. According to these observations, we conclude that the hyperedge weight still plays an important role in hypergraphbased clustering. However unlike classification case, the hypergraphbased clustering is not so sensitive to the choice of hyperedge weight. We follow the same evaluation strategy in the Section 5.3 to study the influences of the choices of the hypergraph framework and the hyperedge weighting scheme to the clustering performances. Figure 5
shows the experimental results under two different evaluation metrics. In the most of case, hypergraphbased clustering can benefit more from a good choice of hypergraph framework than a good choice of hyperedge weight. But, these two benefits are in the same level. So, we still cannot ignore the positive influence of a good hyperedge weight for clustering.
5.5 The Setting of Scaling Parameter
In this section, we conduct several experiments to study the influence of the scaling parameter to the performances of hypergraph learning and also experimentally find optimal on ORL, COIL20, Sheffield and JAFFE databases. We conduct the experiments for addressing the clustering the tasks. The learned optimal is directly applied to hypergraphbased classification, which has already been introduced in Section 5.3. The experimental settings are the same as the settings of hypergraphbased clustering introduced in Section 5.4. We normalize all the hyperedge weights via dividing by the mean of the hyperedge weights before exponentiation, where is the mean of hyperedge weights. After this operation, the different hyperedge weight schemes are in the same scale. Since there are two evaluation metrics for clustering, we use the mean of these two metrics as a new metric to evaluate the clustering performance and then find the optimal for each weighting scheme.
Figure 6 shows the experimental results. From the observations of Figure 6, it seems that the optimal is more related to the database instead of the hypergraph model. In the most of case, CENTROID, TRACE and LLRE can obtain a reasonable performance when is equal to 1 or 10. SUM and VOLUME are more sensitive to the , it is hard to find an optimal that works for all database. Anyway, a good can be learned from training set. This is a commonly adopted strategy in many practical applications.
5.6 Comparison with The StateoftheArts
To further show the value of our work, we select the best performed weighting schemes based on the results of the prior experiments and apply them to both image classification and data representation for clustering. TRACE and VOLUMEbased Zhou’s Normalized Hypergraph are used for classification while CENTORIDbased Zhou’s Normalized Hypergraph and TRACEbased Clique Expansion are adopted for clustering. In these experiments, two challenging databases named Scene15 and Caltech256 databases are added. AdaptiveHypergraph (AdaHyper) adaptive , Relaxed Collaborative Representation classifier (RCR) rcr , Sparse Representation Classifier (SRC) sparse
, Random Forest classifier (RF)
rf and graphbased classifier are the compared methods for classifications.Graph Regularized Nonnegative Matrix Factorization (GNMF) gnmf , LandmarkBased Spectral Clustering (LSC) lbsc , Sparse Representationbased Embedding (SRE) sgraph , Graphbased Normalized cut ncut (or Laplacian EigenMapping eigenmap ) and Kmeans are the compared algorithms for clustering. All the classification experiments are conducted in twofold crossvalidation case. With regard to the clustering experiments, the numbers of clusters are all set as the number of classes for each database. The experimental settings of image classification are following adaptive while the experimental settings of the clustering task is following gnmf . The choices of in image classification experiments are all following the same setting in the previous experiments. With regard to the image clustering, we find that a combination of is much more suitable to the challenging databases, Scene15 and Caltech256 databases. So we let on Scene15 dataset, and on Caltech256 dataset. With regard to ORL and COIL20 datasets, we follow the same setting in Section 5.4.
Database  Mean Classification Errors Standard deviation (%)  

ORL  COIL20  Scene15  Caltech2562000  
ZNH+TRACE  5.501.41  3.680.69  25.471.89  42.752.90 
ZNH+VOLUME  4.501.41  4.790.29  25.401.98  42.702.97 
Graph  24.751.05  43.130.49  32.402.07  51.353.75 
AdapHyperadaptive  11.810.42  10.131.41  25.541.59  43.302.10 
RCRrcr  8.002.07  11.191.10  26.731.43  41.212.11 
Random Forest rf  11.542.13  6.000.10  27.662.63  43.743.02 
SRC sparse  8.251.06  11.810.98  26.802.83  42.801.41 
Database  Clustering Accuracy (Normalized Mutual Information, %)  

ORL  COIL20  Scene15  Caltech2562000  
CliqueExp+TRACE  70.50(82.75)  82.29(89.12)  67.47(62.03)  36.30(34.01) 
ZNH+CENTROID  69.50(81.62)  76.60(85.26)  63.33(59.53)  44.15(43.29) 
GNMF gnmf  65.75(82.19)  82.22(89.99)  62.87(61.32)  39.80(38.45) 
LSC lbsc  66.00(82.39)  76.04(86.13)  66.33(64.01)  43.95(41.32) 
Graph ncut ; eigenmap  67.75(82.01)  69.60(77.00)  56.87(60.21)  38.00(38.69) 
SRE sgraph  70.00(83.10)  69.24(76.16)  58.27(59.09)  35.70(38.91) 
Kmeans  57.75(78.38)  63.70(73.40)  63.20(62.45)  41.15(40.55) 
Table 8 reports the classification performances of five different classifiers and two hypergraphbased classifiers. It is clear that the VOLUME and TRACEbased hypergraph classifiers outperforms all other classifiers on ORL, COIL20 and Scene15 databases and respectively get the second and third places on Clatech256 database. Compared to the regular pairwise graphbased classifier, the gains of the VOLUMEbased hypergraph classifier are 20.45%, 38.34%, 7.00% and 8.65% on ORL, COIL20, Scene15 and Caltech256 databases respectively while these numbers of the TRACEbased hypergraph classifier are 19.45%, 39.45%, 6.93% and 8.60%. Even comparing with the stateoftheart classifiers, such as random forest and sparse representation classifier, our proposed hyperedge weightbased hypergraph classifiers still show their superiorities. Table 9 shows the clustering results of the different algorithms. From the results, we can see that TRACE and CENTROIDbased hypergraph models can get the best clustering accuracies on all four databases and they also can get the promising NMIs in comparison with the stateoftheart algorithms. Similar to the results of classification, the advantage of hypegraph models over graph model is very obvious. For exmaple, the clustering accuracy gains of the TRACEbased hypergraph over graph are 2.75%, 12.69% and 10.60% on ORL, COIL20 and Scene15 databases respectively. The experiments verify that a good hypergraph framework with a carefully chosen hyperedge weight is very competitive for classification and clustering. Moreover, it is still possible to further improve the performances of these hypergraphs by using our proposed weighing schemes, since many other settings of the proposed weighting schemes are not explored yet. For example, VOLUME computes the hyperedge weight from edge weights. But, in these experiments, we only adopt the HeatKernel weighting scheme to compute edge weight. There are still many other edge weighting schemes can be applied. Similarly, LLRE only uses the common linear regression to compute the local linear reconstruction errors. The more advanced linear regression methods, such as Sparse representation and collaborative representation, have not been tried yet.
5.7 Experimental Analysis and Discussion
Several conclusions can be made from the experimental results listed in Figures 3  6 and Tables 2  9. We believe these conclusions are very instructive to the researchers who work at hypergraph learning. Here we give these conclusions:

Similar to the choice of the hypergraph algorithm itself, the choice of the hyperedge weight also plays a very important, even a more important role. A prominent improvement can be obtained by carefully choosing the suitable hyperedge weight. This can be noticed widely in our experiments, particular in the experiments of classification.

Hypergraph model is more sensitive to the hyperedge weight when it is used for classification than for clustering.

The proposed hyperedge weights, VOLUME and TRACE, can be deemed as two representative hyperedge weight schemes for classification, since they distinctly outperform other weight schemes in all experiments. However, for the clustering task, it is hard to conclude a representative weighting scheme, since all five weighting schemes have a similar comprehensive performance. Comparatively speaking, TRACE and CENTROID performs slightly better. Moreover, in the clustering case, all five weighing schemes significantly outperforms the unweighted case.

According to our experimental study, we respectively select two combinations of the hypergraph model and weighting scheme for classification and clustering. Zhou’s normalized hypergraph with VOLUME and TRACE are used for classification. Clique expansion with TRACE and Zhou’s normalized hypegraph with CENTROID are used for clustering. The results reported in Tables 8 and 9 demonstrate that such simple combination can get very promising performance in comparison with the stateoftheart algorithms. Clearly, such phenomena not only verify the importance of the hyperedge weight in hypergraph learning, but also show the potential of hypegraph learning for addressing the visual tasks.
6 Conclusion
We presented a comprehensive experimental study of hyperedge weight in hypergraph learning to draw the researchers’ attention to the importance of designing hyperedge weights. In order to verify the importance of hyperedge weight, three novel hyperedge weights, namely VOLUME, TRACE and LLRE, are respectively proposed from the perspectives of geometry, multivariate statistical analysis and linear regression. These three novel hyperedge weights and three other commonly adopted hyperedge weights are applied to three popular hypegraph frameworks, including Zhou’s normalized Laplacian, clique expansion and star expansion, for two fundamental learning tasks: clustering and classification. Extensive experiments on ORL, COIL20, JAFFE, Sheffield databases demonstrated that a good hyperedge weight can significantly improve the performances of hypergraph learning. Moreover, we compare the simple combination of a conventional hypergraph framework and a carefully chosen weight scheme with some stateoftheart algorithms in image classification and clustering on two more larger and challenging databases, namely Scene15 and Caltech256. The results show that such simple combination can get a promising performance. Our work is a fundamental study, so there are many meaningful works can be done based on our study. Applying the hypergraph frameworks and new hyperedge weights to address dimensionality reduction dhlp , feature selection nmi and multilabel classification mhyper may be our future works.
Acknowledgement
This work has been partially funded by Fundamental Research Funds for the Central Universities (Grant No. CDJXS11181162) and the National Natural Science Foundation of China (Grant No. 91118005). The authors would like to thank the reviewers for their useful comments.
References
 (1) S. Gao, I. Tsang, L. Chia, Laplacian sparse coding, hypergraph laplacian sparse coding, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 92–104.
 (2) Y. Gao, M. Wang, D. Tao, R. Ji, Q. Dai, 3d object retrieval and recognition with hypergraph analysis, IEEE Transactions on Image Processing 21 (9) (2012) 4290–4303.
 (3) Y. Huang, Q. Liu, S. Zhang, D. N. Metaxas, Image retrieval via probabilistic hypergraph ranking, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3376–3383.
 (4) P. Ochs, T. Brox, Higher order motion models and spectral clustering, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 614–621.
 (5) T. Parag, A. Elgammal, Supervised hypergraph labeling, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 2289–2296.
 (6) J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application in image classification, IEEE Transactions on Image Processing 21 (7) (2012) 3262–3272.
 (7) Z. Zhang, P. Ren, E. Hancock, Unsupervised feature selection via hypergraph embedding, in: Birtish Machine Vision Conference (BMVC), 2012, pp. 1–11.
 (8) L. Pu, B. Faltings, Hypergraph learning with hyperedge expansion, in: Machine Learning and Knowledge Discovery in Databases, 2012, pp. 410–425.
 (9) J. Y. Zien, M. D. Schlag, P. K. Chan, Multilevel spectral hypergraph partitioning with arbitrary vertex sizes, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems 18 (9) (1999) 1389–1399.
 (10) S. Agarwal, J. Lim, L. ZelnikManor, P. Perona, D. Kriegman, S. Belongie, Beyond pairwise clustering, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2005, pp. 838–845.
 (11) D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: Clustering, classification, and embedding, in: Advances in neural information processing systems (NIPS), 2006, pp. 1601–1608.
 (12) M. Bolla, Spectra, euclidean representations and clusterings of hypergraphs, Discrete Mathematics 117.
 (13) S. Agarwal, K. Branson, S. Belongie, Higher order learning with graphs, in: International Conference on Machine Learning (ICML), 2006, pp. 17–24.
 (14) J. Rodríguez, On the laplacian spectrum and walkregular hypergraphs, Linear and Multilinear Algebra 51.
 (15) D. Cai, X. He, J. Han, Document clustering using locality preserving indexing, IEEE Transactions on Knowledge and Data Engineering 17 (12) (2005) 1624–1637.
 (16) D. Cai, X. He, J. Han, T. S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (8) (2011) 1548–1560.

(17)
X. He, S. Yan, Y. Hu, P. Niyogi, H.J. Zhang, Face recognition using laplacianfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (3) (2005) 328–340.
 (18) S. Huang, A. Elgammal, L. Huangfu, D. Yang, X. Zhang, Globalitylocality preserving projections for biometric data dimensionality reduction, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2014, pp. 15–20.
 (19) J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227.
 (20) L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: International Conference on Computer Vision (ICCV), 2011, pp. 471–478.
 (21) P. Gritzmann, V. Klee, On the complexity of some basic problems in computational convexity, in: Polytopes: Abstract, Convex and Computational, 1994, pp. 373–466.
 (22) R. Timofte, L. Van Gool, Sparse representation based projections, in: Birtish Machine Vision Conference (BMVC), 2011, pp. 61–1.
 (23) F. S. Samaria, A. C. Harter, Parameterisation of a stochastic model for human face identification, in: IEEE Workshop on Applications of Computer Vision, 1994.
 (24) S. A. Nene, S. K. Nayar, H. Murase, Columbia object image library (coil20), Technical Report CUCS00596.
 (25) D. B. Graham, N. M. Allinson, Face recognition: From theory to applications, NATO ASI Series F, Computer and Systems Sciences 163.
 (26) M. N. Dailey, C. Joyce, M. J. Lyons, M. Kamachi, H. Ishi, J. Gyoba, G. W. Cottrell, Evidence and a computational explanation of cultural differences in facial expression recognition., Emotion 10.
 (27) S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2006, pp. 2169–2178.
 (28) G. Griffin, A. Holub, P. Perona, Caltech256 object category dataset.
 (29) A. Bergamo, L. Torresani, A. Fitzgibbon, Picodes: Learning a compact code for novelcategory recognition, in: Advances in neural information processing systems (NIPS), 2011, pp. 2088–2096.
 (30) M. Yang, D. Zhang, S. Wang, Relaxed collaborative representation for pattern classification, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2224–2231.
 (31) L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.

(32)
X. Chen, D. Cai, Large scale spectral clustering with landmarkbased representation., in: AAAI Conference on Artificial Intelligence (AAAI), 2011.
 (33) J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (8) (2000) 888–905.
 (34) M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural computation 15 (6) (2003) 1373–1396.
 (35) S. Huang, D. Yang, Y. Ge, D. Zhao, X. Feng, Discriminant hyperlaplacian projections with its applications to face recognition, in: IEEE conference on Multimedia and Expo Workshop on HIM (ICMEW), 2014.
 (36) L. Sun, S. Ji, J. Ye, Hypergraph spectral learning for multilabel classification, in: ACM international conference on Knowledge discovery and data mining (SIGKDD), 2008, pp. 668–676.
Comments
There are no comments yet.