On The Effect of Hyperedge Weights On Hypergraph Learning

10/24/2014 ∙ by Sheng Huang, et al. ∙ Chongqing University 0

Hypergraph is a powerful representation in several computer vision, machine learning and pattern recognition problems. In the last decade, many researchers have been keen to develop different hypergraph models. In contrast, no much attention has been paid to the design of hyperedge weights. However, many studies on pairwise graphs show that the choice of edge weight can significantly influence the performances of such graph algorithms. We argue that this also applies to hypegraphs. In this paper, we empirically discuss the influence of hyperedge weight on hypegraph learning via proposing three novel hyperedge weights from the perspectives of geometry, multivariate statistical analysis and linear regression. Extensive experiments on ORL, COIL20, JAFFE, Sheffield, Scene15 and Caltech256 databases verify our hypothesis. Similar to graph learning, several representative hyperedge weighting schemes can be concluded by our experimental studies. Moreover, the experiments also demonstrate that the combinations of such weighting schemes and conventional hypergraph models can get very promising classification and clustering performances in comparison with some recent state-of-the-art algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As a general version of pairwise graphs, hypergraph learning is commonly used in computer vision, machine learning and pattern recognition areas, _meaning:NTF . e.g _catcode:NTF a e.g. e.g. lsc ; sum ; phr ; higher ; supervised ; adaptive ; nmi ; he

, since it represents the similarity relation of data via measuring the similarity between groups of points, which is deemed as a fundamental issue in the aforementioned research areas. Recently, many researchers have been keen to develop different hypergraph models for addressing different tasks, and many impressive hypergraph models were proposed. Hypergraph algorithms can be roughly divided into two categories. The first category uses the spectral clustering techniques to partition the vertices via constructing a simple pairwise graph from the original hypergraph. Representative methods include clique expansion

expansion , star expansion expansion and clique averaging mean , _meaning:NTF . etc _catcode:NTF a etc. etc.. The second category from this category defines a hypergaph Laplacian using analogies from the simple pairwise graph Laplacian. Its representative methods include Zhou’s normalized Laplacian zhou , Bolla’s Laplacian bolla , _meaning:NTF . etc _catcode:NTF a etc. etc.. However, interestingly, as was shown in hol , all of the previous algorithms, despite their very different formulations, can be reduced to two graph constructions, the star expansion and the clique expansion, and they are equivalent to each other under specific conditions.

Figure 1: Three explanations of the hyperedge , (a) a -simplex, (b) a cluster and (c) a linear combination of homogenous vertices.

There are extensive studies about hypergraph construction, _meaning:NTF . e.g _catcode:NTF a e.g. e.g. mean ; bolla ; rodriguez ; zhou ; expansion . But, to the best of our knowledge, there are no prior works to formally discuss the importance of hyperedge weight to hypergraph learning. In the graph learning, which is the pairwise case of hypergraph, extensive studies have already shown that a good choice of edge weight can significantly improve the graph-based algorithms. The Heat-Kernel and Dot-Product weighting schemes are considered as the two most representative weighting schemes of edges lpi ; gnmf ; lpp ; glpp . Therefore, we argue that the choice of hyperedge weights also should play a crucial role in hypergraph learning. This motivated us to investigate if there exist a representative hyperedge weighting scheme in hypergraph learning. Moreover, we believe that different hyperedge weights actually provide different ways to explain the hypergraph from different perspectives. In this paper, we try to fill such gap and empirically discuss the influence of hyperedge weight to hypergraph learning via presenting and evaluating three novel hyperedge weighting schemes.

As several hypergraph algorithms have been proposed, a few hyperedge weighting schemes have been heuristically mentioned in such papers. For example, Huang _meaning:NTF . et al _catcode:NTF a et al. et al. 

phr

proposed a probabilistic hypergraph-based image retrieval system. In this system, the hyperedge is generated by

-nearest neighbour searching and its weight is the sum of the pairwise edge weights between the centroid (seed point) of hyperedge and its neighbours. Zhang _meaning:NTF . et al _catcode:NTF a et al. et al. nmi

presented an unsupervised hypergraph-based feature selection method, which measures the high-order similarity of the vertices in a hyperedge using multidimensional interaction information (MII). For addressing a 3-D object retrieval task, Gao _meaning:NTF . et al _catcode:NTF a et al. et al. 

sum calculated the hyperedge weight via directly summing the weights of all pairwise edges whose end points are all in the same hyperedge. Clearly, the computation of such hyperedge weight is actually the inverse process of the clique expansion. So, if we use the mean operation to replace the sum operation, such way will be the inverse process of the clique averaging. Different from the previous three methods, Yu _meaning:NTF . et al _catcode:NTF a et al. et al. adaptive defined the hyperedge weight as a parameter of the hypergraph model via imposing a sparsity constraint. Thus, the hyperedge weights can be adaptively learned as the graph model optimized. The initial hyperedge weights of this method are constructed by following Huang’s way phr , and the global optimal weights still cannot be guaranteed. Certainly, there are also some other hyperedge weighting schemes higher ; supervised , but most of them are associated with very specific tasks.

Complementary to the previously proposed hyperedge weights, we carefully design three novel hyperedge weights from the perspectives of geometry, multivariate statistical analysis and linear regression sparse ; collabrative (see Figure 1). From the perspective of geometry, a hyperedge can be deemed as a high-order simplex hol . Thus, the volume of simplex (VOLUME) is a natural hyperedge weight, which provides a reasonable dissimilarity measure for a point set. Motivated by some studies from geometry det , we present three ways to compute the volume of simplex for different situations. It is worthwhile to note that these three ways actually define the mathematical relationships between hyperedges and vertices, a hyperedge and its pairwise edges, a hyperedge and its sub-hyperedges, respectively. From the perspective of data mining and multivariate statistical analysis, the hyperedge can be naturally regarded as a cluster in the sample space, thus the trace of the scatter matrix (TRACE) of the samples in the same hyperedge should be a good hyperedge weight. From the perspective of linear regression sparse ; collabrative , the linear reconstruction error (LLRE) of the homogenous samples should be smaller than the one of the inhomogeneous samples. So, we consider a hyperedge as a small subset of samples, and use the local linear reconstruction error of each point in the hyperedge to measure the similarity of the point set.

In order to verify the importance of hyperedge weighting scheme in hypergraph learning, three state-of-the-art hypergraph models including Zhou’s normalized Laplacian zhou , clique expansion and star expansion expansion , are adopted to evaluate the different hyperedge weights for clustering and classification. Several representative hyperedge weighting schemes for classification and clustering are concluded from our experimental results on ORL, COIL20, Sheffield and JAFFE databases. Such experimental results also demonstrate that a carefully chosen hyperedge weight can significantly improve the performance of hypergraph algorithms. Moreover, we simply apply the combinations of the traditional hypergraph model and the learned representative weighting schemes for image classification and clustering on some larger databases, such as Sence15 and Caltech256 databases. The results show that such simple combination can also get very promising performances in comparison with the state-of-the-art algorithms.

There are mainly three contributions to our work:

  1. Three novel weighting schemes, include the volume of simplex (VOLUME), the trace of scatter matrix (TRACE) and the local linear reconstruction error (LLRE), are proposed from three different perspectives, with a desirable property of VOLUME is that it gives the definitions of the relationships between the hyperedge and its vertices, its pairwise edges, and its sub-hypersedges. Extensive experiments show that our proposed weight schemes significantly outperform the conventional weighting schemes in classification and can get a competitive performance in clustering.

  2. We empirically verify the importance of the choice of hyperedge weight on hypergraph learning and draw the researcher’s attention to the importance of the design of hyperedge weight.

  3. Representative hyperedge weighting schemes for classification and clustering are experimentally compared and this is very instructive for the hypergraph-based studies.

The rest of paper is organized as follows: Section 2 presents the background of hypergraph learning; Section 3 describes the proposed hyperedge weights. Section 4 shows how to solve classification and clustering tasks using hypegraph; Section 5 presents the experiments and the conclusion is summarized in Section 6.

Figure 2: (a) An example of hypergraph which has 18 hyperedges (15 pairwise edges + 3 three-order hyperedges ), and (b) its corresponding vertex-edge incident matrix.

2 Background

In this section, in order to analyze of the influences of the hyperedge weight to hypergraph learning, the basic notations of hypergraph and three common hypergraph frameworks will be introduced, including Clique Expansion expansion , Star Expansion expansion and Zhou’s normalized hypergraph zhou . Besides these three hypergraph frameworks, Bolla’s Laplacian bolla , Rodriguez’s Laplacian rodriguez and Clique Averaging mean are also very popular hypergraph frameworks. We didn’t choose them, since Bolla’s Laplacian and Rodriguez’s Laplacian are unweighted hypergraphs and Clique Averaging is deemed as solving the same approximation problem as Clique Expansion.

2.1 Notations

The key difference between the hypergraph and the ordinary graph is that each edge (hyperedge) of hypergraph can connect more than two vertices (see Figure 2). Let denote a hypergraph with vertex set and edge set . The edges are arbitrary subsets of with weight associated with edge . The degree of a vertex is

(1)

The degree of a hyperedge is denoted by . For -uniform hypergraphs, the degrees of each hyperedge are the same, . In particular, for the case of ordinary graphs or 2-graphs, . The vertex-edge incidence matrix is dimensional binary matrix whose -th entry is . If , is 1, otherwise it is . By these definitions, we have

(2)

and

(3)

and are the diagonal matrices consisting of edge (hyperedge) and vertex degrees, respectively. is the diagonal matrix of edge weights, .

2.2 Clique Expansion

In the clique expansion algorithm expansion , each hyperedge is expanded to a clique. A pairwise graph is expanded from the original hypegraph using clique expansion. We have and . The edge weight of minimizes the difference between the weight of the graph edge and the weight of each hyperedge that contains both and :

(4)

The solution of this criterion is

(5)

where is a fixed scalar. The combinatorial or normalized Laplacian of the constructed graph is then used to partition the vertices.

2.3 Star Expansion

In the star expansion algorithm expansion , a new vertex is introduced for each hyperedge and this new vertex is connected to each vertex in this hyperedge. More specifically, for a hypergraph , the vertex and edge sets of the star-expanded pairwise-graph, denoted as and , are defined as and , respectively. Thus each hyperedge in is expanded into a star in , which is a bipartite graph. The weight of the edge in is given by

(6)

Since , we can assume that in , all are ordered before . Let denote the weight between the vertices constructed from and the vertices from in . The adjacency matrix for can be obtained readily from . Based on the adjacency matrix, the degree for all vertices can be computed. We use and to denote the diagonal matrices of vertex degrees for vertices in and in the expanded graph , respectively. Finally, the Laplacian of star expansion is formulated as follows,

(7)

where

is an identity matrix.

2.4 Zhou’s Normalized Laplacian

Zhou’s Normalized Laplacian zhou is a representative method of defining the hypergraph Laplacian using analogies from the graph Laplacian. Following the random walk model, Zhou et al. proposed the following normalized hypergraph Laplacian :

(8)

In the random walk model, given the current position , the walker first chooses a hyperedge over all hyperedges incident with

, with probability proportional to

, and then chooses a vertex uniformly at random.

3 Hyperedge Weight Computation

In this section, we design three novel hyperedge weighting schemes from the perspectives of geometry, multivariate statistical analysis and linear regression. The volume of simplex, scatter and linear reconstruction error are adopted respectively as the similarity measure of the point set.

3.1 Volume of Simplex

From geometry perspective, each hyperedge can be deemed as a simplex hol . Thus, a geometric measure of a set of points can be naturally obtained by computing the volume of the simplex, since a smaller volume of the simplex indicates a closer geometric relationships between the vertices in the hyperedge and vice versa.

There are three ways to compute the volume of the simplex. The first way is to use the vertices of the simplex to compute its volume. Let the vertices of a -degree simplex associated with the th -degree hyperedge be represented as

-dimensional column vectors

. According to Gram Determinant formula det , we cam define a matrix , whose th column vector is . The volume of the simplex can be computed as follows

(9)

where is the matrix determinant and is factorial. This way defines the relationship between the hyperedge weights and its vertices.

The second way is to utilize the edges of the simplex to compute its volumes. This way is very crucial, since it defines the relationship between hyperedge weight and the pairwise edge weights. Let denote the distance between th vertex and th vertex (or to use the pairwise edge weight instead). Then, we can construct

presudo-affinity matrix

as follows

(10)

According to the Cayley-Menger Determinant formula det , the volume of simplex associated with the hyperedge is denoted as follows

(11)

The third way is to use the hyperfaces of simplex to compute the volume of simplex. For a -degree simplex, it should have

-hyperfaces, where each hyperface is a hyperplane whose Cartesian equation is given by

(12)

where are variables standing for real numbers and the are real constants. Let be the matrix with elements and be the cofactor matrix of matrix with respect to . Then, according to the Klebaner-Sudbury-Satterson Determinant formula det , the volume of simplex can be computed as follows

(13)

Actually, such cases where only information about hyperfaces is known is very strict, which may seldom happen in practical applications. But we still think this formulation is noteworthy, since it theoretically sets up a link between the sub-hyperedge and hyperedge. The reason why it can put such link is that a hyperface of the simplex is also a simplex, for example, the 2-hyperface of a simplex is a 2-simplex (triangle).

After obtaining the simplex volume, the weight of hyperedge which is associated with the simplex is given as follows

(14)

where is a positive parameter controls the scaling of the hyperedge weight.

The previous formulas held for arbitrary and . But, the dimensions of feature should be equal or greater than the degree of hyperedge , _meaning:NTF . i.e _catcode:NTF a i.e. i.e.,  ,since the volumes will be degenerated when . However, in computer vision applications, typically anyway, so the degeneracy is unlikely to happen.

3.2 Trace of The Scatter Matrix

From the perspectives of multivariate statistical analysis and data mining, each hyperedge can be considered as a cluster in the sample space. So it is very natural to use the scatter matrix to measure the compactness of a cluster (hyperedge). Therefore, we denote this weight by TRACE. Let the -dimensional matrix denote the sample matrix associated with the vertices of a -degree hyperedge . Then, the scatter matrix , which is a positive semi-definite matrix, is computed as follows

(15)

where is the mean of the samples and is a -dimensional matrix whose columns are all . Finally, we can compute the weight of hyperedge as the trace of the scatter matrix

(16)

where denotes the matrix trace, is an element-wise exponential operation and is a positive parameter for controlling the scale of the weight.

3.3 Local Linear Reconstruction Error

We can measure the similarity between a single point and a point set by the linear reconstruction error. The reconstruction error is expected to be smaller if the sample is reconstructed from a homogenous sample rather than an inhomogeneous samples. More specifically, each hyperedge is consider as a subset of the samples. We denote this scheme by LLRE. So, we can follow a leave-one-out strategy to get the reconstruction errors of each sample in such subset via linear regression. In the case of undirected hypergraph, each vertex will get a reconstruction error. We assume -dimensional samples are associated with the ordered vertices in -degree hyperedge . The reconstruction coefficients of sample , which miminizes the reconstruction error can be solved as a least-square problem as follows

(17)

where is as a dimensional matrix whose -th column is where and . The solution of this problem is where is the generalized inverse of matrix. After obtaining , its corresponding reconstruction error can be computed as follows

(18)

For an undirected hypergraph, the overall reconstruction error of a hyperedge can be flexibly assigned as the mean of the reconstruction errors of the samples , the minimum of the reconstruction errors of the samples or the maximum of the reconstruction errors of the samples . In the case of a directed hypergraph, each directed hyperedge gets one reconstruction error, since the subscript of hypergraph is ordered and only the vertex corresponding to the first subscript of hyperedge is considered as reconstructed point. So, in that case, the overall reconstruction error of hyperedge is directly equal to .When the hyperedge is generated by -nearest neighbor searching, the overall reconstruction error of hyperedge can be assigned as the reconstruction error of the samples corresponding to the seed point for saving time. Finally, we use a positive to scale the hyperedge weight with reconstruction error .

(19)

Some reasonable constraints can be imposed to the coefficient for furtherly optimizing this model. For example, a sparsity constraint may make sense when the degree of hyperedge is extremely high, thus the coefficient can be solved as a sparse representation task sparse ; sgraph or collaborative representation task collabrative .

4 Clustering and Classification

After getting the hyperedge weight, the aforementioned three hypergraph learning frameworks (in section 2) are utilized to learn the corresponding hypergraph Laplacians, which can be used for clustering or classification tasks. According to Zhou’s work zhou , the hypergraph-based clustering and embedding problem is formulated as the following standard Normalized cut (Ncut) problem.

(20)

where the dimensional matrix is the learned hypergraph Laplacian. According to Zhou’s work,

is an eigenvector of

corresponding to the smallest eigenvalue that should be equal to zero. Clearly, this problem can be solved as an eigenvalue problem and the solution of

-ways partition are the eigenvectors of corresponding to the smallest nonzero eigenvalues.

With regard to the hypergraph-based classification, the -dimensional vector is deemed as a classification function over

, which classifies each vertex

as the sign . On one hand, in order to assign the same labels to vertices which have many incident hyperedges in common, a functional should be defined to minimize the sum of the changes of a function over the hyperedges of the hypergraph. According to Zhou’s work zhou , such functional is exactly as . On the other hand, the initial label assignment should be changed as little as possible. Let -dimensional vector be the label function of the -th class, where or -1 if the vertex belonging to -th class or other classes respectively, and 0 if the vertex is unlabeled. Thus, the hypergraph-based classification can be formulated as the following optimization problem

(21)

where is the class number and matrix is the collection of vector . is the parameter specifying the tradeoff between the two competitive terms. According to Yu’s work adaptive , the solution of this problem is as follows

(22)

where matrix is a label matrix whose -th column is . After obtaining , the classification of -th sample can be accomplished by assigning it to the -th class that satisfies .

5 Experiments

In order to evaluate the influence of the hyperedge weight choice strategy to hypergraph learning, three classical hypegraphs, namely Zhou’s Normalized Laplacian zhou , Clique Expansion expansion and Star Expansion expansion , are used to address the clustering and classification tasks on six databases: ORLorl , COIL20 coil20 , Sheffield umist , JAFFE jaffe , Scene15 sence15 and Caltech256 caltech256 databases. In these experiments, our proposed hyperedge weighting schemes, as well as another three commonly adopted hyperedge weighting schemes are applied to the previous hypergraph frameworks.

5.1 Data Sets and Experimental Configurations

Six datasets, including ORL, JAFEE, COIL20, Sheffield, Scene15 and Caltech256-2000, are used in our experiments and their details are reported in Table 1. Among them, Caltech256-2000 dataset adaptive is a subset of caltech 256 caltech256 . We use the first four databases to experimentally study the impact of hyperedge weight to the performance of hypergraph learning, since these four datasets possess manifold structures, and hypergraph learning is a manifold learning technique.

Database Name Classes Total Samples Feature Dimension Manifold
ORL orl 40 400 Grayscale 10304 Pose
COIL20 coil20 20 1440 Grayscale 1024 View
JAFFE jaffe 10 213 Grayscale 4096 Expression
Sheffield umist 20 564 Grayscale 10304 Pose
Scene15 sence15 15 1500 PiCoDes picodes 2048 unknown
Caltech256-2000 adaptive 20 2000 PiCoDes picodes 2048 unknown
Table 1: The involved datasets

Three very commonly used hyperedge weighting schemes are implemented to compare with our proposed hyperedge weights. The first hyperedge weighting scheme is 0-1 weighting scheme zhou ; bolla , which is also commonly adopted in the regular graph case gnmf . The hypergraph with this weighting scheme can be regarded as the unweighted hypegraph, since all hyperedges in this case are equal to 1. We name this weighting scheme BINARY in our experiments. The second hyperedge weight is the sum of the weights of the pairwise edges in it sum ; expansion ,

(23)

This hyperedge weight computation is the inverse process of Clique Expansion and this weight can be deemed as the perimeter of the simplex. For the convenience of discussion, we shortly name this weight SUM in the experiment section. Another frequently used hyperedge weight is the mean of the weights of the pairwise edges in it,

(24)

where is the vertex degree of the hyperedge mean . This hyperedge weight computation is the inverse process of Clique Averaging. However, this case is actually equivalent to SUM

. So, we will not adopt it for comparison. The third hyperedge weighting scheme stems from the KNN-based hyperedge generation. In this case, each hyperedge has a seed point, which is also known as the centroid of the neighborhood. The hyperedge weight is the sum of the distances between the centroid and each of its neighbors in a hyperedge,

(25)

where is the vertex subscript of centroid in hyperedge phr ; adaptive . For convenience, we shortly name it CENTROID. We remind that our proposed hyperedge weights, based on volume of the simplex, trace of the scatter matrix and local linear reconstruction errors, have been respectively renamed as VOLUME, TRACE and LLRE in the introduction.

5.2 Implementation Details

It is impracticable to enumerate all possible hyperedges. For example, for a 400 vertices undirected hypergraph, there are more than eight billion 5-degree hyperedges. Therefore, in this paper, we generate hyperedges following Huang’s strategy that each hyperedge is generated by a KNN searching given a vertex phr . For different databases, the is different. We set on ORL database and set on COIL20 database following the choice of in adaptive . With regard to Sheffield database and JAFFE database, we apply two-fold cross validation to learn the optimal which was found to be equal to 5. Similarly, we apply two-fold cross validation to learn the optimal scaling parameter under the different hypergraph frameworks, and we fixed the classification trade off parameter to 1. With regard to the two larger datasets, Scene15 and Caltech256, we follow the same experimental setting of adaptive , where and in Caltech256 and Scene15 datasets respectively.

(a) Zhou’s Normalized Hypergraph
(b) Clique Expansion
(c) Star Expansion
Figure 3: The mean classification errors of four databases using six different hyperedge weighting schemes respectively under three different hypergraph frameworks. (The lower is better)
Database Mean Classification Errors Standard deviation (%)
BINARY SUM CENTROID VOLUME TRACE LLRE
ORL 7.751.77 6.251.06 7.751.77 5.501.41 4.501.41 7.251.77
JAFEE 15.003.93 12.221.57 15.002.36 12.780.79 13.330.00 13.892.36
Sheffield 18.790.73 15.341.71 17.593.90 8.793.66 11.212.68 18.790.70
COIL20 5.350.88 4.860.00 7.222.55 3.680.69 4.790.29 7.854.03
Average 11.72 9.67 11.89 7.69 8.46 11.94
Table 2: Classification performances of Zhou’s Normalized Laplacian using different hyperedge weights on ORL, COIL20, JAFFE and Sheffield databases. (The lower is better)
Database Mean Classification Errors Standard deviation (%)
BINARY SUM CENTROID VOLUME TRACE LLRE
ORL 9.750.35 7.750.35 9.750.35 6.750.35 5.750.35 9.750.35
JAFEE 16.115.50 14.441.57 15.003.93 15.000.79 14.441.57 16.115.5
Sheffield 20.001.95 15.522.44 18.972.44 8.973.90 11.212.19 20.001.95
COIL20 5.420.98 5.560.00 8.062.16 4.100.29 4.720.59 8.753.34
Average 12.82 10.82 12.94 8.70 9.03 13.65
Table 3: Classification performances of Clique Expansion using different hyperedge weights on ORL, COIL20, JAFFE and Sheffield databases.(The lower is better)
Database Mean Classification Errors Standard deviation (%)
BINARY SUM CENTROID VOLUME TRACE LLRE
ORL 8.001.41 6.250.35 8.001.41 5.501.41 4.751.06 7.251.77
JAFEE 14.443.14 13.330.00 14.443.14 12.780.79 13.890.79 14.443.14
Sheffield 19.480.24 16.211.46 19.480.24 10.173.66 11.382.93 19.480.24
COIL20 5.491.08 5.280.59 7.362.75 3.890.98 5.000.59 8.064.32
Average 11.85 10.27 12.32 8.08 8.75 12.31
Table 4: Classification performances of Star Expansion using different hyperedge weights on ORL, COIL20, JAFFE and Sheffield databases. (The lower is better)

5.3 Evaluation in Classification

We conduct some experiments to study the influence of hyperedge weight to the hypergraph in classification. The cross validation scheme is applied in these experiments.

Tables 23 and 4 respectively report the classification errors of Zhou’s Normalized Laplacian, Clique Expansion and Star Expansion frameworks using different hyperedge weights in two-fold cross validation case. Figure 3 presents the comprehensive evaluation results of the six weighting schemes under three hypergraph frameworks. The Y-axis of this Figure indicates the mean classification errors of four databases. According to Tables 234 and Figure 3, it is clear that the proposed weighting schemes, VOLUME and TRACE, outperforms other four weighting schemes. For example, on Sheffield database, the classification accuracy gains of VOLUME over BINARY, SUM, CENTROID are 11.03%, 6.45% and 10% respectively using Clique Expansion. Such gains for the TRACE are 8.79%, 4.31% and 7.76%. From comprehensive perspective, the average classification accuracy gains of VOLUME over the frequently adopted weighting scheme CENTROID are 4.2%, 4.24% and 4.24% using Zhou’s Normalized Hypergraph, Clique Expansion and Star Expansion respectively. These numbers of TRACE are 3.43%, 3.91% and 3.47%.

Moreover, several experiments are conducted for studying the influences of the choices of the hypergraph framework versus the choice of hyperedge weighting scheme to the classification performances. To measure the impact of the choice of the hypergraph framework, we measure the classification accuracy improvement of the best framework choice over the worst choice. We use the same strategy to measure the impact of the choice of the hyperedge weighting scheme, and the choice of their combination to the classification performance. Figure 4 reports the results of these experiments. The results demonstrate that the classification performance is benefited much more from a good choice of hyperedge weight than a good choice of hypergraph framework in all experiments. In the most of cases, the positive impact from a good hyperedge weight is five times even ten times of the positive impact from a good hypergraph framework. This phenomenon reveals the importance of hyperedge weight in hypergraph-based classification.

(a) 20% samples for training
(b) 30% samples for training
(c) 40% samples for training
(d) 50% samples for training
Figure 4: The classification accuracy improvements of the best choices of hyperedge weight, hypergraph model and their combination, over the worst choices under different training sample perecents.

5.4 Evaluation in Clustering

In this section, we report several experiments that are conducted for studying the influence of hyperedge weight to a clustering task. At first, we apply different hypergraph algorithms to learn the embedding of data. After that, k-means is adopted to predict the label of samples based on the embedding results, and the number of clusters is fixed to the number of classes. The clustering result is evaluated by comparing the predicted label of each sample with the label provided by the data set. Two metrics, the accuracy (AC) and the normalized mutual information metric (NMI) are used to measure the clustering performance. Please see

lpi for the detailed definitions of these two metrics.

Database Accuracy (Mutual Information Metric, %)
BINARY SUM CENTROID TRACE VOLUME LLRE
ORL 66.75(82.62) 66.75(81.57) 69.50(81.62) 68.50(81.62) 69.50(81.62) 67.50(79.96)
JAFEE 59.44(65.86) 65.00(65.20) 65.00(73.36) 64.44(72.10) 58.33(67.72) 58.33(67.72)
Sheffield 64.35(77.77) 65.04(75.82) 64.87(75.82) 64.87(75.82) 64.87(75.82) 64.87(75.82)
COIL20 60.42(74.30) 76.60(85.26) 76.60(85.26) 76.60(85.26) 76.46(85.58) 79.86(88.02)
Average 62.74(75.16) 68.35(76.96) 68.99(79.01) 68.60(78.7) 67.29(77.69) 67.64(77.88)
Table 5: Clustering performances of Zhou’s Normalized Laplacian using different hyperedge weights on ORL, COIL20, JAFFE, Sheffield databases. (The higher is better)
Database Accuracy (Mutual Information Metric, %)
BINARY SUM CENTROID TRACE VOLUME LLRE
ORL 65.25(82.20) 67.50(80.98) 65.75(82.9) 70.50(82.75) 65.00(82.44) 65.00(80.76)
JAFEE 71.11(70.16) 70.00(71.62) 69.44(68.71) 71.67(72.87) 74.44(76.27) 71.11(69.26)
Sheffield 61.91(78.13) 66.09(80.47) 66.09(80.47) 66.09(80.47) 66.09(80.47) 66.09(80.69)
COIL20 69.72(79.82) 82.29(89.12) 79.31(89.00) 82.29(89.12) 82.29(89.12) 82.29(89.12)
Average 67.00(77.56) 71.47(80.55) 70.15(80.27) 72.64(81.30) 71.96(82.08) 71.12(79.96)
Table 6: Clustering performances of Clique Expansion using different hyperedge weights on ORL, COIL20, JAFFE, Sheffield databases. (The higher is better)
Database Accuracy (Mutual Information Metric, %)
BINARY SUM CENTROID TRACE VOLUME LLRE
ORL 63.75(79.01) 65.75(79.48) 65.75(79.48) 67.00(80.48) 65.75(79.48) 68.25(80.52)
JAFEE 61.11(67.51) 63.33(70.23) 61.11(67.51) 65.56(70.54) 61.67(68.16) 63.89(69.23)
Sheffield 58.43(72.7) 62.96(75.10) 62.43(77.62) 62.26(73.73) 58.26(74.28) 58.26(73.89)
COIL20 49.65(64.67) 62.36(73.15) 62.43(73.86) 56.46(71.29) 59.03(72.53) 62.99(73.62)
Average 58.23(70.97) 63.60(74.49) 62.93(74.62) 62.82(74.01) 61.18(73.61) 63.35(74.31)
Table 7: Clustering performances of Star Expansion using different hyperedge weights on ORL, COIL20, JAFFE, Sheffield databases. (The higher is better)

Tables 5, 6 and 7 respectively report the clustering results of Zhou’s Normalized Laplacian, Clique Expansion and Star Expansion frameworks using different hyperedge weights on four databases. From the experimental results, it seems that different weighting schemes performs well on different databases. From the comprehensive evaluation based on the mean accuracy and mean mutual information metric of four databases, CENTRIOD and TRACE slightly performs better than VOLUME, SUM and LLRE. However, another interesting phenomenon is that all five weighting schemes significantly outperform the unweighted case, BINARY. According to these observations, we conclude that the hyperedge weight still plays an important role in hypergraph-based clustering. However unlike classification case, the hypergraph-based clustering is not so sensitive to the choice of hyperedge weight. We follow the same evaluation strategy in the Section 5.3 to study the influences of the choices of the hypergraph framework and the hyperedge weighting scheme to the clustering performances. Figure 5

shows the experimental results under two different evaluation metrics. In the most of case, hypergraph-based clustering can benefit more from a good choice of hypergraph framework than a good choice of hyperedge weight. But, these two benefits are in the same level. So, we still cannot ignore the positive influence of a good hyperedge weight for clustering.

(a) Accuracy
(b) NMI
Figure 5: The average classification accuracy improvements of the best choices of hyperedge weight, hypergraph model and their combination, over their worst choices on four databases.

5.5 The Setting of Scaling Parameter

In this section, we conduct several experiments to study the influence of the scaling parameter to the performances of hypergraph learning and also experimentally find optimal on ORL, COIL20, Sheffield and JAFFE databases. We conduct the experiments for addressing the clustering the tasks. The learned optimal is directly applied to hypergraph-based classification, which has already been introduced in Section 5.3. The experimental settings are the same as the settings of hypergraph-based clustering introduced in Section 5.4. We normalize all the hyperedge weights via dividing by the mean of the hyperedge weights before exponentiation, where is the mean of hyperedge weights. After this operation, the different hyperedge weight schemes are in the same scale. Since there are two evaluation metrics for clustering, we use the mean of these two metrics as a new metric to evaluate the clustering performance and then find the optimal for each weighting scheme.

(a) ZNH on ORL
(b) CliqueExp on ORL
(c) StarExp on ORL
(d) ZNH on JAFFE
(e) CliqueExp on JAFFE
(f) StarExp on JAFFE
(g) ZNH on Sheffield
(h) CliqueExp on Sheffield
(i) StarExp on Sheffield
(j) ZNH on COIL20
(k) CliqueExp on COIL20
(l) StarExp on COIL20
Figure 6: The effect of for clustering on ORL, JAFFE, Sheffield and COIL20 databases using hyperedge weights of SUM, CENTROID, TRACE, VOLUEM and LLRE. (ZNH = Zhou’s Normalized Hypergraph, CliqueExp = Clique Expansion, StarExp = Star Expansion)

Figure 6 shows the experimental results. From the observations of Figure 6, it seems that the optimal is more related to the database instead of the hypergraph model. In the most of case, CENTROID, TRACE and LLRE can obtain a reasonable performance when is equal to 1 or 10. SUM and VOLUME are more sensitive to the , it is hard to find an optimal that works for all database. Anyway, a good can be learned from training set. This is a commonly adopted strategy in many practical applications.

5.6 Comparison with The State-of-the-Arts

To further show the value of our work, we select the best performed weighting schemes based on the results of the prior experiments and apply them to both image classification and data representation for clustering. TRACE and VOLUME-based Zhou’s Normalized Hypergraph are used for classification while CENTORID-based Zhou’s Normalized Hypergraph and TRACE-based Clique Expansion are adopted for clustering. In these experiments, two challenging databases named Scene15 and Caltech256 databases are added. Adaptive-Hypergraph (Ada-Hyper) adaptive , Relaxed Collaborative Representation classifier (RCR) rcr , Sparse Representation Classifier (SRC) sparse

, Random Forest classifier (RF)

rf and graph-based classifier are the compared methods for classifications.

Graph Regularized Nonnegative Matrix Factorization (GNMF) gnmf , Landmark-Based Spectral Clustering (LSC) lbsc , Sparse Representation-based Embedding (SRE) sgraph , Graph-based Normalized cut ncut (or Laplacian EigenMapping eigenmap ) and K-means are the compared algorithms for clustering. All the classification experiments are conducted in two-fold cross-validation case. With regard to the clustering experiments, the numbers of clusters are all set as the number of classes for each database. The experimental settings of image classification are following adaptive while the experimental settings of the clustering task is following gnmf . The choices of in image classification experiments are all following the same setting in the previous experiments. With regard to the image clustering, we find that a combination of is much more suitable to the challenging databases, Scene15 and Caltech256 databases. So we let on Scene15 dataset, and on Caltech256 dataset. With regard to ORL and COIL20 datasets, we follow the same setting in Section 5.4.

Database Mean Classification Errors Standard deviation (%)
ORL COIL20 Scene15 Caltech256-2000
ZNH+TRACE 5.501.41 3.680.69 25.471.89 42.752.90
ZNH+VOLUME 4.501.41 4.790.29 25.401.98 42.702.97
Graph 24.751.05 43.130.49 32.402.07 51.353.75
Adap-Hyperadaptive 11.810.42 10.131.41 25.541.59 43.302.10
RCRrcr 8.002.07 11.191.10 26.731.43 41.212.11
Random Forest rf 11.542.13 6.000.10 27.662.63 43.743.02
SRC sparse 8.251.06 11.810.98 26.802.83 42.801.41
Table 8: Image classification performances of five classifiers. (ZNH = Zhou’s Normalized Hypergraph)
Database Clustering Accuracy (Normalized Mutual Information, %)
ORL COIL20 Scene15 Caltech256-2000
CliqueExp+TRACE 70.50(82.75) 82.29(89.12) 67.47(62.03) 36.30(34.01)
ZNH+CENTROID 69.50(81.62) 76.60(85.26) 63.33(59.53) 44.15(43.29)
GNMF gnmf 65.75(82.19) 82.22(89.99) 62.87(61.32) 39.80(38.45)
LSC lbsc 66.00(82.39) 76.04(86.13) 66.33(64.01) 43.95(41.32)
Graph ncut ; eigenmap 67.75(82.01) 69.60(77.00) 56.87(60.21) 38.00(38.69)
SRE sgraph 70.00(83.10) 69.24(76.16) 58.27(59.09) 35.70(38.91)
Kmeans 57.75(78.38) 63.70(73.40) 63.20(62.45) 41.15(40.55)
Table 9: Clustering performances of five algorithms. (CliqueExp = Clique Expansion)

Table 8 reports the classification performances of five different classifiers and two hypergraph-based classifiers. It is clear that the VOLUME and TRACE-based hypergraph classifiers outperforms all other classifiers on ORL, COIL20 and Scene15 databases and respectively get the second and third places on Clatech256 database. Compared to the regular pairwise graph-based classifier, the gains of the VOLUME-based hypergraph classifier are 20.45%, 38.34%, 7.00% and 8.65% on ORL, COIL20, Scene15 and Caltech256 databases respectively while these numbers of the TRACE-based hypergraph classifier are 19.45%, 39.45%, 6.93% and 8.60%. Even comparing with the state-of-the-art classifiers, such as random forest and sparse representation classifier, our proposed hyperedge weight-based hypergraph classifiers still show their superiorities. Table 9 shows the clustering results of the different algorithms. From the results, we can see that TRACE and CENTROID-based hypergraph models can get the best clustering accuracies on all four databases and they also can get the promising NMIs in comparison with the state-of-the-art algorithms. Similar to the results of classification, the advantage of hypegraph models over graph model is very obvious. For exmaple, the clustering accuracy gains of the TRACE-based hypergraph over graph are 2.75%, 12.69% and 10.60% on ORL, COIL20 and Scene15 databases respectively. The experiments verify that a good hypergraph framework with a carefully chosen hyperedge weight is very competitive for classification and clustering. Moreover, it is still possible to further improve the performances of these hypergraphs by using our proposed weighing schemes, since many other settings of the proposed weighting schemes are not explored yet. For example, VOLUME computes the hyperedge weight from edge weights. But, in these experiments, we only adopt the Heat-Kernel weighting scheme to compute edge weight. There are still many other edge weighting schemes can be applied. Similarly, LLRE only uses the common linear regression to compute the local linear reconstruction errors. The more advanced linear regression methods, such as Sparse representation and collaborative representation, have not been tried yet.

5.7 Experimental Analysis and Discussion

Several conclusions can be made from the experimental results listed in Figures 3 - 6 and Tables 2 - 9. We believe these conclusions are very instructive to the researchers who work at hypergraph learning. Here we give these conclusions:

  1. Similar to the choice of the hypergraph algorithm itself, the choice of the hyperedge weight also plays a very important, even a more important role. A prominent improvement can be obtained by carefully choosing the suitable hyperedge weight. This can be noticed widely in our experiments, particular in the experiments of classification.

  2. Hypergraph model is more sensitive to the hyperedge weight when it is used for classification than for clustering.

  3. The proposed hyperedge weights, VOLUME and TRACE, can be deemed as two representative hyperedge weight schemes for classification, since they distinctly outperform other weight schemes in all experiments. However, for the clustering task, it is hard to conclude a representative weighting scheme, since all five weighting schemes have a similar comprehensive performance. Comparatively speaking, TRACE and CENTROID performs slightly better. Moreover, in the clustering case, all five weighing schemes significantly outperforms the unweighted case.

  4. According to our experimental study, we respectively select two combinations of the hypergraph model and weighting scheme for classification and clustering. Zhou’s normalized hypergraph with VOLUME and TRACE are used for classification. Clique expansion with TRACE and Zhou’s normalized hypegraph with CENTROID are used for clustering. The results reported in Tables 8 and 9 demonstrate that such simple combination can get very promising performance in comparison with the state-of-the-art algorithms. Clearly, such phenomena not only verify the importance of the hyperedge weight in hypergraph learning, but also show the potential of hypegraph learning for addressing the visual tasks.

6 Conclusion

We presented a comprehensive experimental study of hyperedge weight in hypergraph learning to draw the researchers’ attention to the importance of designing hyperedge weights. In order to verify the importance of hyperedge weight, three novel hyperedge weights, namely VOLUME, TRACE and LLRE, are respectively proposed from the perspectives of geometry, multivariate statistical analysis and linear regression. These three novel hyperedge weights and three other commonly adopted hyperedge weights are applied to three popular hypegraph frameworks, including Zhou’s normalized Laplacian, clique expansion and star expansion, for two fundamental learning tasks: clustering and classification. Extensive experiments on ORL, COIL20, JAFFE, Sheffield databases demonstrated that a good hyperedge weight can significantly improve the performances of hypergraph learning. Moreover, we compare the simple combination of a conventional hypergraph framework and a carefully chosen weight scheme with some state-of-the-art algorithms in image classification and clustering on two more larger and challenging databases, namely Scene15 and Caltech256. The results show that such simple combination can get a promising performance. Our work is a fundamental study, so there are many meaningful works can be done based on our study. Applying the hypergraph frameworks and new hyperedge weights to address dimensionality reduction dhlp , feature selection nmi and multi-label classification mhyper may be our future works.

Acknowledgement

This work has been partially funded by Fundamental Research Funds for the Central Universities (Grant No. CDJXS11181162) and the National Natural Science Foundation of China (Grant No. 91118005). The authors would like to thank the reviewers for their useful comments.

References

  • (1) S. Gao, I. Tsang, L. Chia, Laplacian sparse coding, hypergraph laplacian sparse coding, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 92–104.
  • (2) Y. Gao, M. Wang, D. Tao, R. Ji, Q. Dai, 3-d object retrieval and recognition with hypergraph analysis, IEEE Transactions on Image Processing 21 (9) (2012) 4290–4303.
  • (3) Y. Huang, Q. Liu, S. Zhang, D. N. Metaxas, Image retrieval via probabilistic hypergraph ranking, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3376–3383.
  • (4) P. Ochs, T. Brox, Higher order motion models and spectral clustering, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 614–621.
  • (5) T. Parag, A. Elgammal, Supervised hypergraph labeling, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 2289–2296.
  • (6) J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application in image classification, IEEE Transactions on Image Processing 21 (7) (2012) 3262–3272.
  • (7) Z. Zhang, P. Ren, E. Hancock, Unsupervised feature selection via hypergraph embedding, in: Birtish Machine Vision Conference (BMVC), 2012, pp. 1–11.
  • (8) L. Pu, B. Faltings, Hypergraph learning with hyperedge expansion, in: Machine Learning and Knowledge Discovery in Databases, 2012, pp. 410–425.
  • (9) J. Y. Zien, M. D. Schlag, P. K. Chan, Multilevel spectral hypergraph partitioning with arbitrary vertex sizes, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 18 (9) (1999) 1389–1399.
  • (10) S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, S. Belongie, Beyond pairwise clustering, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2005, pp. 838–845.
  • (11) D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: Clustering, classification, and embedding, in: Advances in neural information processing systems (NIPS), 2006, pp. 1601–1608.
  • (12) M. Bolla, Spectra, euclidean representations and clusterings of hypergraphs, Discrete Mathematics 117.
  • (13) S. Agarwal, K. Branson, S. Belongie, Higher order learning with graphs, in: International Conference on Machine Learning (ICML), 2006, pp. 17–24.
  • (14) J. Rodríguez, On the laplacian spectrum and walk-regular hypergraphs, Linear and Multilinear Algebra 51.
  • (15) D. Cai, X. He, J. Han, Document clustering using locality preserving indexing, IEEE Transactions on Knowledge and Data Engineering 17 (12) (2005) 1624–1637.
  • (16) D. Cai, X. He, J. Han, T. S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (8) (2011) 1548–1560.
  • (17)

    X. He, S. Yan, Y. Hu, P. Niyogi, H.-J. Zhang, Face recognition using laplacianfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (3) (2005) 328–340.

  • (18) S. Huang, A. Elgammal, L. Huangfu, D. Yang, X. Zhang, Globality-locality preserving projections for biometric data dimensionality reduction, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2014, pp. 15–20.
  • (19) J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227.
  • (20) L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: International Conference on Computer Vision (ICCV), 2011, pp. 471–478.
  • (21) P. Gritzmann, V. Klee, On the complexity of some basic problems in computational convexity, in: Polytopes: Abstract, Convex and Computational, 1994, pp. 373–466.
  • (22) R. Timofte, L. Van Gool, Sparse representation based projections, in: Birtish Machine Vision Conference (BMVC), 2011, pp. 61–1.
  • (23) F. S. Samaria, A. C. Harter, Parameterisation of a stochastic model for human face identification, in: IEEE Workshop on Applications of Computer Vision, 1994.
  • (24) S. A. Nene, S. K. Nayar, H. Murase, Columbia object image library (coil-20), Technical Report CUCS-005-96.
  • (25) D. B. Graham, N. M. Allinson, Face recognition: From theory to applications, NATO ASI Series F, Computer and Systems Sciences 163.
  • (26) M. N. Dailey, C. Joyce, M. J. Lyons, M. Kamachi, H. Ishi, J. Gyoba, G. W. Cottrell, Evidence and a computational explanation of cultural differences in facial expression recognition., Emotion 10.
  • (27) S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2006, pp. 2169–2178.
  • (28) G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset.
  • (29) A. Bergamo, L. Torresani, A. Fitzgibbon, Picodes: Learning a compact code for novel-category recognition, in: Advances in neural information processing systems (NIPS), 2011, pp. 2088–2096.
  • (30) M. Yang, D. Zhang, S. Wang, Relaxed collaborative representation for pattern classification, in: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2224–2231.
  • (31) L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.
  • (32)

    X. Chen, D. Cai, Large scale spectral clustering with landmark-based representation., in: AAAI Conference on Artificial Intelligence (AAAI), 2011.

  • (33) J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (8) (2000) 888–905.
  • (34) M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural computation 15 (6) (2003) 1373–1396.
  • (35) S. Huang, D. Yang, Y. Ge, D. Zhao, X. Feng, Discriminant hyper-laplacian projections with its applications to face recognition, in: IEEE conference on Multimedia and Expo Workshop on HIM (ICMEW), 2014.
  • (36) L. Sun, S. Ji, J. Ye, Hypergraph spectral learning for multi-label classification, in: ACM international conference on Knowledge discovery and data mining (SIGKDD), 2008, pp. 668–676.