Hypergraph Learning with Line Expansion

05/11/2020 ∙ by Chaoqi Yang, et al. ∙ University of Illinois at Urbana-Champaign 0

Previous hypergraph expansions are solely carried out on either vertex level or hyperedge level, thereby missing the symmetric nature of data co-occurrence, and resulting in information loss. To address the problem, this paper treats vertices and hyperedges equally and proposes a new hypergraph formulation named the line expansion (LE) for hypergraphs learning. The new expansion bijectively induces a homogeneous structure from the hypergraph by treating vertex-hyperedge pairs as "line nodes". By reducing the hypergraph to a simple graph, the proposed line expansion makes existing graph learning algorithms compatible with the higher-order structure and has been proven as a unifying framework for various hypergraph expansions. For simple graphs, we demonstrate that learning algorithms defined on LEs tie with their performance on the original graphs, implying that no loss of information occurs in the expansion. For hypergraphs, we show that learning over the new representation leads to algorithms that beat all prior state-of-the-art hypergraph learning baselines.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper proposes a new hypergraph formulation, line expansion (LE), for the problem of hypergraph learning. The proposed LE is a topological mapping, transforming the hypergraph into a homogeneous structure, while preserving all the higher-order relations. LE allows all the existing graph learning algorithms to work on hypergraphs.

The problem of hypergraph learning is important. Graph-structured data are ubiquitous in practical machine/deep learning applications, such as social networks

(ChitraR19), protein networks (klamt2009hypergraphs), and co-author networks (ZhouHS06). Intuitive pairwise connections among nodes are usually insufficient for capturing real-world higher-order relations. For example, in social networks, many relations (such as trust, friendship, or interest) are not transitive. Thus, it is difficult to infer trust or user interest groups from pairwise associations. For another example, in biology, proteins are bound by polypeptide chains, thus their relations are naturally higher-order. Hypergraphs allow modeling such multi-way relations, where edges could be incident to more than two nodes.

Figure 1: Bipartite Relation in Hypergraphs

However, the research on spectral theory for hypergraphs is far less been developed (ChitraR19). Hypergraph learning was first introduced in (ZhouHS06) as a propagation process on hypergraph structure, however, (AgarwalBB06) indicated that their Laplacian matrix is equivalent to pairwise operation. Since then, researchers explored non-pairwise relationships by developing nonlinear Laplacian operators (chan2018spectral; li2017inhomogeneous), utilizing random walks (ChitraR19; bellaachia2013random) and learning the optimal weights (li2017inhomogeneous; li2018submodular) of hyperedges. Essentially, all of these algorithms focus on vertices, viewing hyperedges as connectors, and they explicitly break the bipartite property of hypergraphs (shown in Fig. 1).

The investigation of deep learning on hypergraphs is also in a nascent stage. (feng2019hypergraph) developed Chebyshev formula for hypergraph Laplacians and proposed HGNN. Using a similar hypergraph Laplacian, (yadati2018hypergcn) proposed HyperGCN while (hyperOperation) generalized (KipfW17; VelickovicCCRLB18) and defined two neural hypergraph operators. However, they in fact both constructed a simple weighted graph and applied mature graph learning algorithms by introducing only vertex functions, which is not sufficient for higher-order learning.

Our motivation stems from the lack of powerful tools for representing the hyper-structure. We are also motivated by the richness of literature on graphs as well as recent success of graph representation learning (GRL) (henaff2015deep; KipfW17; DefferrardBV16; VelickovicCCRLB18) with powerful neural operators (convolution, attention, spectral, etc). The point is that if we could develop a mapping from the hypergraph to a simple graph, without losing information, then these theoretical properties and representation algorithms on graphs would be applicable and flexible for hypergraphs.

For a hypergraph, our line expansion (LE) induces a new structure, where the “node” is a vertex-hyperedge pair, and “edges” between two “nodes” are constructed by either their vertex or hyperedge, which ends up the same. It is obvious that the new structure is (i) homogeneous, (i.e., a graph where nodes have the same semantics) and (ii) symmetrical to the original vertex and hyperedge. We further prove that LE is also (iii) bijective. To conduct hypergraph learning, we first transform the hypergraph to LE, where the actual learning happens. Features from vertices/hyperedges will be projected to nodes on the induced graph, and final representations of those nodes will be aggregated and back-projected to the original vertices/hyperedges.

The proposed line expansion of hypergraphs is novel and informative, compared to traditional formulations, where the hyperedges are usually transformed into cliques of edges (e.g., clique/star expansions (AgarwalBB06)) or hypergraph cuts (ZhouHS06), or the learning solely depends on edge connectivity (e.g., hyperedge expansions (pu2012hypergraph)). Differently, LE treats vertices and hyperedges equally, thus preserving the nature of hypergraphs.

Note that, LE

is also significantly different from those theoretical hypergraph transformations, such as tensor based hyper-matrix representation

(ouvrard2017adjacency), line graphs of hypergraphs (bermond1977line), intersection graphs of hypergraphs (naik2018intersection), or middle graphs of hypergraphs (cockayne1978properties). These formulations either require strong constraints (e.g., uniform hypergraphs) or result in heterogeneous topologies as well as other structures that complicate practical usage. For example, such formulations may restrict applicability of simple graph-based algorithms due to their special structures.

Theoretically, this paper further revisits the formulation of the standard star/clique expansion and simple graph learning algorithms. We conclude that they can be unified as special cases of LE. From an algebra-geometric view, traditional hypergraph/graph learning algorithms are proven to only pass information following the 0-chain form. It is however possible to operate on the higher-order chain on our LE.

Empirically, this paper demonstrates the effectiveness of LE in two types of experiments. First, using citation networks, we show that the performance of graph algorithms defined on simple graphs and on LE are similar. Second, for six real-world hypergraphs, we apply the popular graph convolutional networks (GCNs) on LE. The performance of our model is shown to consistently outperform other hypergraph learning baselines.

Our paper is organized as follows. In Section 2, we introduce the general notations of hypergraphs and formulate our problem. In Section 3, we propose line expansion of hypergraphs and show some interesting properties. In Section 4, we generalize graph convolutional networks (GCNs) to hypergraphs by line expansion. In Section 5, we theoretically analyze two commonly used hypergraph expansions and show that our proposed line expansion could unify them as well as simple graph adjacency. We empirically evaluate line expansion on two-fold experiments in Section 6 and conclude our work in Section 7.

2 Preliminaries

2.1 Hypergraphs

Research on graph-structured deep learning (KipfW17; VelickovicCCRLB18) stems mostly from Laplacian matrix and vertex functions of simple graphs. Only recently (feng2019hypergraph; hyperOperation), it became possible to learn higher-order relations on hypergraphs.

Hypergraphs. Let denote a hypergraph, with vertex set and edge set . A hyperedge (we sometimes also call it “edge” interchangeably in this paper) is a subset of . Given an arbitrary set , let denote the cardinality of . A regular graph is thus a special case of a hypergraph, with uniformly, which is also called a 2-order hypergraph. A hyperedge is said to be incident to a vertex when . One can represent a hypergraph by a incidence matrix with its entry if and 0 otherwise. For each vertex and hyperdge , and denote their degree functions, respectively. The vertex-degree matrix of a hypergraph is a matrix with each diagonal entry corresponding to the node degree, and the edge-degree matrix is and also diagonal, which is defined on the hyperedge degree.

2.2 Problem Setup

In this paper, we are interested in the transductive problems on hypergraphs, specifically node classification, similar to (feng2019hypergraph; hyperOperation). It aims to induce a labeling from the labeled data as well as the geometric structure of the graph and then assigns a class label to unlabeled vertices by transductive inference.

Specifically, given a hypergraph with the labeled vertex subset and the label , we propose minimizing the empirical risk,


where cross-entropy error (KipfW17) is commonly applied in . Intuitively, node similarity indicates similar labels on graphs. Given the bipartite symmetry in hypergraphs, we posit that vertex similarity and edge similarity are equally important. For more details in transductive learning, one can refer to (zhu2005semi).

3 Hypergraph Line Expansion

Most well-known graph-based algorithms (ng2002spectral; grover2016node2vec) are defined for graphs but not hypergraphs. Therefore, in real applications, hypergraphs are often transformed into simple graphs (ZhouHS06; AgarwalBB06) that are easier to handle.

3.1 Traditional Hypergraph Expansions

Two main ways of approximating hypergraphs by graphs are the clique expansion (sun2008hypergraph) and the star expansion (zien1999multilevel). The clique expansion algorithm (left side of Fig. 2) constructs a graph from the original hypergraph by replacing each hyperedge with a clique in the resulting graph (i.e., ), while the star expansion algorithm (right side of Fig. 2) constructs a new graph by augmenting the vertex set with hyperedges , where vertices and hyperedges are connected by their incident relations (i.e., ). Note that, the star expansion induces a heterogeneous graph structure.

Unfortunately, these two approximations cannot retain or well represent the higher-order properties. Let us consider the co-authorship network, as in Fig. 2, where we view authors as nodes (e.g., , ) and papers as hyperedges (e.g., ). Then we immediately know that author and have jointly written one paper , and together with author , they have another co-authored paper . This hierarchical and multi-way connection is an example of higher-order relation. Assume we follow the clique expansion, then we obviously miss the information of author activity rate and whether the same persons jointly writing two or more articles. Though researchers have remedially used weighted edges (li2017inhomogeneous; ChitraR19), the hyper-dependency still collapses or fuses into linearity. Star expansion express the whole incidence information, but the remaining heterogeneous structure (i) has no explicit vertex-vertex link and (ii) is too complicated for those well-studied graph algorithms, which are mostly designed for simple graphs. One can summarize (HeinSJR13) that these two expansion are not good enough for many applications.

3.2 Our Proposed Line Expansion

Since the commonly used expansions cannot give a satisfactory representation, we seek a new expansion that preserves all the original higher-order relations, while presenting an easy-to-learn graph structure. Motivated by the special symmetric structure of hypergraphs that vertices are connected to multiple edges and edges are conversely connected to multiple vertices, we treat vertices and edges equally and propose hypergraph Line Expansion (LE).

Figure 2: Hypergraph Expansion

The Line Expansion of the hypergraph is constructed as follows (shown in Fig. 2, bottom): (i) each incident vertex-hyperedge pair is considered as a “line node”; (ii) “line nodes” are connected when either the vertex or the hyperedge is the same. Essentially, the induced structure is a graph, where each node and each hyperedge (from the original hypergraph) induces a clique. We now formally define the line expansion of hypergraph .

Line Expansion. Let denotes the graph induced by the line expansion of hypergraph . The node set of is defined by vertex-hyperedge pair from the original hypergraph. The edge set of and adjacency with is defined by pairwise relation if either or .

The construction of the line expansion follows the neighborhood feature sharing mechanism. For graph node representation learning, (DefferrardBV16; KipfW17) first encode local structure by aggregating information from a node’s immediate neighborhood. In line expansion, We view the incidence of vertex-hyperedge as a whole and generalize the concepts of neighbors by defining that two line nodes are neighbors when they contain the same vertex (vertex similarity) or the same hyperedge (edge similarity). We argue that the line expansion consequently preserves higher-order associations.

3.3 Entity Projection

In this section, we define the projection matrices for hypergraph entities (i.e., vertices and hyperedges) for the topological map from to .

In , each line node could be viewed as a vertex with hyperedge context or a hyperedge with vertex context, which means that it encodes part of the vertex (related to that hyperedge) or part of the hyperedge (related to that particular vertex). In a word, the line expansion creates information linkage in the higher-order space.

To scatter the information, a vertex from is mapped to a set of line nodes in . We introduce the vertex projection matrix ,


where each entry records whether the line node contains the vertex. Similarly, we also define an edge projection matrix that encodes the projection of hyperedges to sets of line nodes.

Theorem 1.

Under the construction, for a hypergraph and its line expansion , the mapping from hypergraph to line expansion (i.e., ) is bijective.

The inverse mapping from to is guaranteed by Theorem 1 (proofs are in Appendix B), where the complete information of vertex is re-obtained by aggregating all the the small parts from . Naturally, the overall information from is shared (divided by edge degree ) by vertices under hyperedge .

Therefore, we fuse the higher-order information by defining the vertex back-projection matrix ,


Similarly, we could also get an edge back-projection matrix to integrate all partial information of hyperedges into one piece.

3.4 Additional Properties and Discussion

In this section, we first present an interesting observation between characteristic matrices from and . Then, we connect our line expansion with the “line graph” in graph theory, based on which, some sound properties are provided.

Observation 1.

Let be the incidence matrix of a hypergraph . and are the vertex and hyperedge degree matrices. Let and be the vertex and edge projection matrix, respectively. is the adjacency matrix of line expansion . Let , it satisfies the following equations,


From Observation 1 (see proof in Appendix C), the left hand of both Eqn. (4) and Eqn. (5) are the projection matrices, and the right hand of these two equations are information respectively from the hypergraph and the line expansion. Essentially, they quantify the transition from to . For Eqn. (5), we are interested in the product of with two orders of self-loop, which would be useful in the analytical aspects of line expansion (shown in Section 5).

Theorem 2.

For a hypergraph, its line expansion is equivalent to the line graph of its star expansion , where is a line graph notation from graph theory.

Theorem 2 provides a theoretical interpretation and enriches our expansion with sound graph theoretical properties (chung1997spectral). That is why we name our formulation “line expansion”. Note that the line expansion is significantly different from the “line graph of hypergraphs” discussed in (bermond1977line). Instead, it is the line graph of the star expansion. Detailed proofs of Theorem 2 could be found in Appendix A.

Based on Theorem 2, we know that is homogeneous and has the same connectivity with . The number of new edges in could be calculated as and new nodes as . In the worse case, for a fully-connected k-order hypergraph (), and . However, many social networks are indeed sparse, so the cardinality could reduce to and . According to (ramezanpour2003generating), small-world property and the shape of hypergraph degree distribution is preserved in . (evans2009line) also found that clustering property of could be used to represent the original hypergraph. More interesting properties could be found in (chung1997spectral).

4 Hypergraph Representation Learning

Transductive learning on graphs is successful due to the fast localization and neighbor aggregation (DefferrardBV16; KipfW17; VelickovicCCRLB18). It is easy to define the info-propagation pattern upon simple structures. For real-world cases, relationships among objects are usually more complex than pairwise. Therefore, to apply these algorithms, we need a succinct but informative representation of the higher order relations.

Shown in Section 3, the bijective map from to equipped with four entity projectors () fills the conceptual gap between hypergraphs and graphs. With this powerful tool, it is possible to transfer the hypergraph learning problems into graph structures and address them by using well-studied graph representation algorithms. Note that, this work focuses on the generic hypergraphs without edge weights.

4.1 Hypergraph Learning with Line Expansion

In this section, we generalize graph convolution networks (GCNs) (KipfW17) to hypergraphs and introduce a new learning algorithm defined on line expansion for hypergraph representation. Note that, on our proposed structure, other graph representation algorithms could be extended similarly (perozzi2014deepwalk; tang2015line; HamiltonYL17; VelickovicCCRLB18).

To address the transductive node classification problems on hypergraphs (in Section 2.2), we design the pipeline of our proposed model as the following three steps. First, vertices of the hypergraph will be mapped into multiple related line nodes. Specifically, we use the proposed vertex projection matrix to conduct feature mapping. Second, we apply deep graph learning algorithms (e.g., GCNs) to learn the representation for each line node in higher-order space. Then, the learned representation is fused by , the vertex back-projection matrix, for each vertex in an inverse edge degree manner. The labelling of vertices is predicted on the fused representation.

Feature Projection. For

, given the initial state vector,

( is input dimension), we project it as the initial feature vector in by matrix ,


which essentially scatters features from vertex of to feature vectors of line nodes in .

4.2 Convolution on Line Domain

In line expansion, a line node could be adjacent to another line nodes that contain the same vertex (vertex similarity) or the same hyperedge (edge similarity). Let us denote as the representation of line node in the -th layer.

Convolution Layer. By incorporating information from both vertex-similar neighbors and hyperedge-similar neighbors, the convolution is defined as,



is a non-linear activation function like

ReLU (KipfW17) or LeakyReLU (VelickovicCCRLB18). is the filter parameters for layer . Two hyper-parameters are what we used to parametrize vertex similarity and edge similarity. Specifically, in Eqn. (7), the first term (i.e., ) convolves information from neighbors who share similar edges, whereas the second term (i.e., ) convolves information from neighbors who share similar vertices. In the experiment, we set .

We present the parameterized adjacency matrix of ,


and adopt the renormalized trick (KipfW17) with the adjustment (two-orders of self-loop, referring to Section 3.4): with and , to make Eqn. (7) compact,


Representation Projection. After the convolution layer, is the representation matrix for all line nodes, from which we could derive fused representation for both vertices and hyperedges in . The representation for the vertex can be obtained by aggregating representations based on the reciprocal of edge degree by using back-projector , formally,


where is the dimension of output representation. Note that, in this work, we are only interested in the node representation. However, due to the symmetry of hypergraphs, this work also sheds some light on the applications of learning hyper-edges (e.g., relation mining) by using , with node classification algorithms. We leave it to future work.

In sum, the complexity of -layer convolution is of , since the convolution operator could be efficiently implemented as the a product of a sparse matrix with a dense matrix.

4.3 Message Passing in Higher-order Space

Current graph learning methods play neighborhood message passing (Pearl82) by applying convolution operations into both spatial (KipfW17; VelickovicCCRLB18; hyperOperation; feng2019hypergraph) and spectral domains (BrunaZSL13; HenaffBL15). However, one critical weakness of them is that the convolution operations are only applied to vertices. The interchangeable and complementary nature between nodes and edges are generally ignored in previous research (abs-1806-00770).

Let us cast the problem in the space of algebraic-geometry. A graph can be represented by points in an abstract space with lines connecting them as edges. Then a vertex function is defined for message passing through node adjacency. In graphs, edges can be viewed as 1-simplex line segments with vertices located at the corners. The edge topology, however, could be generalized to higher order simplex structures in the case of hypergraphs. According to previous studies (chung1997spectral; forman2003bochner), topologists also define functions on sets of vertices (i.e., simplices), where the vertex function is a special case (i.e., -chain operator). The functions on sets of vertices are referred as -chains where is the size of simplex on which they are defined.

For simple graphs, the current learning methods mostly operate on -chains, and edge variations are generally missing. For hypergraphs, researchers often collapse the higher order structure by attaching weights and only apply 0-chain operators on the remaining topology. We conjecture that operators in the node regime are not sufficient. It is better to also consider 1-chains or functions defined on higher-order edges. In this work, instead of designing a local vertex-to-vertex operator (feng2019hypergraph; ZhouHS06; ZhangHTC17), we treat the vertex-hyperedge relation as a whole. Our convolution operator (in Section 4.2) on line-induced is equivalent to exchanging information simultaneously across vertices and hyperedges of , thus the learning process on line expansion goes beyond 0-chains. The total variations of higher order chains enrich our model to capture higher order relationships.

5 Unifying Hypergraph Expansion

In this section, we show that our proposed line expansion is powerful in that it unifies clique and star expansions, as well as simple graph adjacency cases.

5.1 Clique Expansion and Star Expansion

Given a hypergraph , consider the clique expansion . For each pair ,


where in standard clique expansion, we have,


For the same hypergraph , star expansion gives . We adopt adjacency formulation from (AgarwalBB06), formally,


5.2 Line Expansion

To analyze the message passing on line expansion, we begin by introducing some notations. Let us use (in short, ) to denote the representation of line node at the -th layer. The convolution operator on line expansion, in Eqn. (7), can be presented,


We augment Eqn. (15) by applying 2-order self-loops (mentioned in Section 4.1), and it yields,


It is hard and unfair to directly compare the proposed algorithm with clique/star expansions, since our graph operator is not defined on hypergraph vertices. Thus, we calculate the expected representation for vertex denoted as , i.e., aggregating line node representations by back-projector ,


After organizing the formula, we calculate that for each hypergraph vertex pair , they are adjacent by,


or by the following form after symmetric re-normalization,


5.3 Analysis of Unification

We already show that line expansion enables to exchange information beyond 0-chain and thus can utilize the higher order relation. In this subsection, we illustrate why line expansion is more powerful at the actual message passing.

Unifying Star and Clique Expansion. We start by considering the clique expansion graph with weighting function,


Note that this is equivalent to vanish Eqn. (12) by a factor of . We plug the value into Eqn. (13), then adjacency of clique expansion transforms into,


Note that when we set (no message passing from hyperedge-similar neighbors). The higher-order relation of line expansion, in Eqn. (19) degrades into,


The Eqn. (22) is exactly the adjacency of star expansion in Eqn. (14), and Eqn. (21) (adjacency of clique expansion) is the 1-order self-loop form of the degraded line expansion.

Unifying Simple Graph. The convolution operator (KipfW17) on a simple graph can be briefly present,


A graph could be regarded as a 2-order hypergraph, where hyperedge has exactly two vertices, i.e., and each pair of vertices has at most one common edge. Plugging the value into Eqn. (22), and it yields,


Comparing Eqn. (23) and (24), the only difference is a scaling factor , which could be absorbed into filter .

To sum up, we prove that clique and star expansions and simple graph adjacency could all be unified as a special class of line expansion, where there is no information sharing between hyperedge-similar neighbors.

6 Experiments

We empirically evaluated the representation power of line expansion (LE). Our experiments are two-fold: we first compare the performance of popular graph learning algorithms on graphs and on line expansion. This experiment is to verify the “generalization” conclusion. Then, we further demonstrate the effectiveness of our proposed model on line expansion for six real-world hypergraphs. All the experiments are conducted 50 times on one Linux server with 256 GB memory and 48 CPUs.

6.1 Simple Citation Network Classification

Since simple graphs are a special case of hypergraphs, we apply line expansion to three citation networks. Cora dataset has 2,708 vertices and 5.2% of them have class labels. Nodes contain sparse bag-of-words feature vectors and are connected by a list of citation links. Another two datasets, Citeseer and Pubmed, are constructed similarly (sen2008collective). Basic statistics are reported in Table 1.

Dataset Nodes Edges Features Class Label rate
Cora 2,708 5,429 1,433 7 0.052
Citeseer 3,327 4,732 4,732 6 0.036
Pubmed 19,717 44,338 500 3 0.003
Table 1: Overview of Citation Network Statistics

We consider the popular deep end-to-end learning methods GCN (KipfW17) and well-known graph representation methods SpectralClustering (SC) (ng2002spectral), Node2Vec (grover2016node2vec), DeepWalk (perozzi2014deepwalk) and LINE (tang2015line). We follow the same experimental setting from (yang2016revisiting).

Model Cora Citeseer Pubmed
SC 53.3 0.2 50.8 0.7 55.2 0.4
Node2Vec 66.3 0.3 46.2 0.7 71.6 0.5
DeepWalk 62.8 0.6 45.7 1.2 63.4 0.4
LINE 27.7 1.1 30.8 0.2 53.5 0.8
GCN 82.6 0.7 (3s) 70.5 0.3 (9s) 78.2 0.6 (12s)
LE+SC 56.9 0.2 50.7 0.2 71.9 0.7
LE+Node2Vec 74.3 0.4 46.2 0.1 74.3 0.4
LE+DeepWalk 68.3 0.1 50.4 0.4 68.0 0.8
LE+LINE 51.7 0.2 34.9 0.5 57.5 0.3
LE+GCN 82.3 0.5 (8s) 70.4 0.3 (11s) 78.7 0.4 (31s)
Table 2: Results for Citation Network Node Classification (%)

Analysis. The results of transductive node classification for citation networks are shown in Table 2. The experiment clearly demonstrates that LE shows comparable results in graph node classification tasks. Specifically for those non-end-to-end methods, they consistently outperform the original algorithm on simple graphs.

End-to-end GCNs can reach a much higher accuracy compared to other baselines. We observe that LE

+GCN tie with original GCN on the three datasets. However, the expansion of original network indeed provides lower variance consistently and contributes to more robust models.

Dataset * of hypergraphs * of its line expansion
Vertices Hyperedges Incidence Exp. edge Exp. density Line node Line edge Max Density Features Class Label rate
20News 16,242 100 65,363 26,634,200 2.0e-1 64,363 34,426,427 2,241 1.6e-2 100 4 0.025
Mushroom 8,124 112 40,620 6,964,876 2.1e-1 40,620 11,184,292 1,808 1.2e-2 112 2 0.006
Zoo 101 42 1,717 5,050 1.0e-0 1,717 62,868 93 4.3e-2 17 7 0.650
ModelNet40 12,311 12,321 61,555 68,944 9.1e-4 61,555 317,083 30 1.7e-4 2048 40 0.800
NTU2012 2,012 2,012 10,060 10,013 4.9e-3 10,060 48,561 19 9.6e-4 2048 67 0.800
BCancer 699 90 6,291 205,237 8.4e-1 6,291 784,129 579 4.0e-2 9 2 0.075

Exp. edge is given by the clique expansion, and density is computed by (coleman1983estimation). Exp. density is computed on clique expansion.

Table 3: Overview Statistics of Hypergraphs and Their Line Expansions
Model 20News Mushroom Zoo ModelNet40 NTU2012 BCancer
LR 57.5 0.7 81.6 0.1 74.3 0.0 59.0 2.8 37.5 2.1 84.6 1.0
H-NCut (ZhouHS06) 57.3 0.5 87.7 0.2 87.3 0.5 91.4 1.1 74.8 0.9 87.6 0.1
Hyper-Conv (hyperOperation) 57.8 0.7 (22.8s) 93.7 0.6 (10.6s) 93.1 2.3 (0.8s) 91.1 0.8 (38.5s) 79.4 1.3 (6.3s) 93.4 0.6 (4.2s)
HGNN (feng2019hypergraph) 58.1 0.2 (23.7s) 93.1 0.5 (11.2s) 92.0 2.8 (0.8s) 91.7 0.4 (37.3s) 80.0 0.7 (5.6s) 93.5 0.2 (5.3s)
HyperGCN (yadati2018hypergcn) 58.8 0.3 (27.9s) 92.3 0.3 (13.4s) 93.1 2.3 (1.1s) 91.4 0.9 (49.6s) 80.4 0.7 (9.7s) 94.2 0.2 (5.8s)
LE+GCN 60.8 0.2 (38.6s) 95.2 0.1 (25.6s) 97.0 0.0 (2.8s) 93.6 0.3 (89.2s) 84.3 0.2 (16.9s) 95.8 0.1 (11.1s)
Table 4: Accuracy and Executed Time for Real-World Hypergraph Tasks (%)

6.2 Real-world Hypergraph Classification

In this section, we employ four SOTA hypergraph learning methods and six real-world datasets to evaluate hypergraph learning with line expansion. We use the generalized GCN model, named LE+GCN, introduced in Sec. 4.

Hypergraph Datasets. The first dataset 20Newsgroups is a modified version111http://www.cs.nyu.edu/ roweis/. It contains 16,242 articles with binary occurrence values of 100 words. Each word is regarded as a hypergraph. The next two datasets are from the UCI Categorical Machine Learning Repository (Dua:2019): Mushroom, Zoo. For these two, a hyperedge is created by all data points which have the same value of categorical features. We follow the same setting for 20Newsgroups, Mushroom, Zoo in (HeinSJR13)

. Other two are computer vision/graphics datasets: Princeton CAD ModelNet40

(wu20153d) and National Taiwan University (NTU) 3D dataset (chen2003visual). We follow the same setting from (feng2019hypergraph): 80% of the data is used to train our model and the remaining 20% as test. The construction of hyperedges is by MVCNN features with 10 nearest neighbors, and the actual feature vectors are given by GVCNN. BCancer is a health-related public dataset (Dua:2019). Basic statistics of datasets are reported in Table 3.

Baselines. In this experiment, we carefully select baselines to compare with our LE

+GCN. Logistic Regression (LR) works as a standard baseline, which only uses independent feature information. H-NCut

(ZhouHS06), equivalent to iH-NCut (li2017inhomogeneous) with uniform hyperedge cost, is a generalized spectral method for hypergraph partition. The general goal is,

Hyper-Conv (hyperOperation), HGNN (feng2019hypergraph) and HyperGCN (yadati2018hypergcn) are three state-of-the-art graph-based hypergraph learning methods.

Analysis. As shown in Table 4, overall our models beat SOTA methods on all datasets consistently. Basically, every model works better than LR, which means transductive feature sharing helps in the prediction. The performance of H-NCut is not as good as graph based baselines. The reason is that graph cut methods depend on linear matrix factorization. However, graph convolution methods are more robust and effective with non-linearity. The remaining three are all graph based deep learning methods. By only utilizing vertex functions on the flattened graph, these algorithms operate more quickly than our LE+GCN, but are much less effective in terms of learning representation.

In essence, current hypergraph deep learning operators are defined on a flattened topology (identical to clique expansion) with the specially designed edge weights. In Table 3, we calculate the edges and density for that topology, denoted as Exp. edge and Exp. density, and find that the scale of line expansion is within times of the flattened topology, except for Zoo (flattened topology is a complete graph). It is also interesting that for most of the datasets, the density of the line expansion graphs is less than of the flattened graph, especially for ModelNet40 and NTU2012, where the factor is about . For each dataset, the training time of our method is within times of these deep graph-based learning algorithms, which we think is acceptable when considering our state-of-the-art performance.

7 Discussion and Conclusion

In this paper, we proposed a novel hypergraph representation, Line Expansion (LE), which is able to utilize higher-order relations for message passing. With line expansion and four entity projectors, we customize graph convolution and develop a novel transductive learning method, LE+GCN, for hypergraphs. Further, we provide sound properties of line expansion and theoretically prove that simple graph adjacency and clique/star expansion could all be unified as special cases of hypergraph learning with line expansion. We evaluate our LE formulation and demonstrate that learning algorithms defined on LEs tie with their performance on the original three citation networks. We further conduct extensive experiments on six real-world hypergraphs and show that LE+GCN can beat SOTA by a significant margin.

A possible future direction is to exploit the symmetry and apply LE for edge learning in complex graphs using node classification algorithms. Another interesting extension is to extend line expansion to directed graphs, where the relation between two nodes are not mutual.


Appendix A Proof of Theorem 2

Statement of the Theorem. For a hypergraph, its line expansion is equivalent to the line graph of its star expansion , where is a line graph notation from graph theory.

We first state the notation: hypergraph , its bipartite representation , its star expansion and its line expansion . Our proof of Theorem 2 will be based on the following three definitions.

Figure 3: Hypergraphs to Bipartite Representation
Definition 1.

(Line Graph.) Given a graph , its line graph is a graph such that each vertex of represents an edge of ; and two vertices of are adjacent if and only if their corresponding edges share a common endpoint (”are incident”) in .

Definition 2.

(Star Expansion of Hypergraph.) Given a hypergraph , its star expansion is a graph such that the vertex set consist of both vertices and hyperedges in and the edges are defined on their incident relations of .

Definition 3.

(Line Expansion of Hypergraph.) Given a hypergraph , its line expansion is a graph such that each vertex of represents an incident vertex-hyperedge pair; and two vertices of are adjacent if and only if they share a common vertex or hyperedge in .


First, it is easy to conclude that the star expansion of the hypergraph is equivalent to the bipartite representation . According to Definition 2, when we re-range the nodes in , listing vertices on the left and hyperedges on the right with edges from , then the remaining structure is identical to shown in Fig. 3 (right).

Second, when we view the bipartite representation as a graph and take its line graph , then according to Definition 1, the nodes of the resulting structure will be an incident vertex-hyperedge pair, and the nodes are adjacent if and only if their corresponding pair share a common endpoint (same vertex or hyperedge in ). This essentially constructs the same graph as line expansion (according to Definition 3).

To sum up, the line expansion of is equivalent to the line graph of bipartite representation , which is also equivalent to line graph of its star expansion . ∎

Appendix B Proof of Theorem 1

Statement of the Theorem. Under the construction, for a hypergraph and its line expansion , the mapping from hypergraph to line expansion (i.e., ) is bijective.

To prove the bijectivity of mapping , we present a graph isomorphism theorem (whitney1992congruent) below. Node that the bipartite representation of hypergraph from is an one-to-one mapping. is a graph with heterogeneous nodes. In the following, we will use to present .

Figure 4: The Exception of Whitney’s Theorem
Theorem 3.

(Whitney Graph Isomorphism Theorem.) Two connected graphs are isomorphic if and only if their line graphs are isomorphic, with a single exception: , the complete graph on three vertices, and the complete bipartite graph , which are not isomorphic but both have as their line graph

Definition 4.

(Maximum Independent set.) An independent set is a set of vertices in a graph, no two of which are adjacent. A maximum independent set is an independent set of largest possible size for a given graph .


For the bipartite representation of the hypergraph, it could be unconnected when parts of the vertices are only incident to parts of the hyperedges. In that case, we could consider it as a union of several disjoint connected components and prove them one by one. So we mainly discuss the case that is connected.

The proof consists of three parts. First, we show that for the class of bipartite graphs, Theorem 3 holds without exception. Second, we will show how to construct a line expansion from the bipartite representation . Third, we show how to recover the bipartite graph from .

First, for the exception in Whitney’s theorem, it is obvious that (in Fig. 4) cannot be the bipartite representation of any hypergraph. Therefore, for bipartite graphs, Theorem 3 holds without exception.

(Injectivity) Second, according to Definition 1, the line expansion of the hypergraph is equivalent to line graph of star expansion , which is the line graph of bipartite representation , i.e., . Also, Theorem 3 guarantees that the topology of is unique. The actual construction is given by Definition 1 or Definition 3.

Figure 5: The construction from to

(Surjectivity) Third, given a line graph topology (of a bipartite graph), we know from Theorem 3 immediately that the original bipartite structure is unique. We now provide a construction from to . Given a line graph structure, we first find a maximum independent set (in Definition 4) and color them in red (shown in Fig. 5 (a)). (paschos2010combinatorial) proves that it could be found in polynomial time.

Matrix Size Detail
if and only if is incident to .
, diagonal
, diagonal
if and only if
if and only if
see and
for and if and only if or
Table 5: Details of Matrices

Since every vertex and hyperedge from spans a clique in . Let us think about the node in this topology, it is potentially a vertex-hyperedge pair in the original hypergraph. Therefore, each node must be connected to exactly two cliques: one spanned by vertex and one spanned by hyperedge . Essentially, we try to project these cliques back to original vertex or hyperedges in . In fact, for each colored node (three in Fig. 5 (a)), we choose one of two cliques connected to it so as to make sure: i) the selected cliques have no intersections (there is only two choices. In this case, choose cliques with on their edges or cliques with on their edges) and ii) the set of cliques cover all nodes in the topology, shown in Fig. 5 (b).

For any given line graph topology (of a bipartite graph), we could always find the set of cliques with on edges or the set of cliques with on edges that satisfies i) and ii), guaranteed by Definition 4. Conceptually, due to the bipartite nature, one set will be the cliques spanned by original hyperedges and another set will be the cliques spanned by original vertices. Either will work for us. Note that the set of cliques with on edges also includes two size-1 clique, denoted as and in Fig. 5 (b). They seem to only connect to one clique with on edges, i.e, clique, however, they are actually size-1 cliques spanned by original vertices which belongs to only one hyperedge in .

To construction of the bipartite representation is as follows: essentially each clique in the given topology will be either a vertex or a hyperedge in . Suppose we have choose the set of cliques with on edges, we transform each selected clique as a hyperedge of . The vertex set is created two-folded: i) a clique with on its edges is a vertex in (In this case, we have three size-2 cliques with on their edges, i.e., , , ), and the vertex will be connected to the according hyperedges. For example, in Fig. 5 (c), will connected to and in , because represents a size-2 clique with on its edges, and represent two selected cliques, and clique is connected to both clique and clique in this topology; ii) nodes only connected to one clique with on their edges are also designated to be vertices in , and we connect them to the represented hyperedge. For example, two nodes denoted as and in Fig. 5 (b). They are indeed two size-1 clique with on their edges (it is not so obvious because size-1 clique has no edge). Indeed, these size-1 cliques are spanned by vertices in original where they only connect to one hyperedge.

So far, we have reconstructed the bipartite representation in Fig. 5 (c) from a given line graph structure in Fig. 5 (a). For those unconnected bipartite representation, we do the same reconstruction for each connected components. Thus, we conclude the bijectivity of line expansion. ∎

Appendix C Proof of Observation 1

Statement of the Observation. Let be the incidence matrix of a hypergraph . and are the vertex and hyperedge degree matrices. Let and be the vertex and edge projection matrix, respectively. is the adjacency matrix of line expansion . Let , it satisfies the following equations,


The detailed meaning of these matrices is shown in notation Table 5. Let us provide an example to give more sense. For the hypergraph shown in Fig. 3, we list the matrices below. It is easy to verify that they satisfy Eqn. (25) and Eqn. (26).


For Eqn. (25),


where the last equation is easy to verify since i) implies the vertex degree matrix, which is . ii) implies the hyperedge degree matrix, which is ; iii) implies the vertex-hyperedge incidence, which is .

For Eqn. (26), each row of is a vector of size with each dimension indicating a vertex or a hyperedge. Therefore, the vector has exactly two s, which is due to that a line node contains exactly one vertex and one hyperedge.

For the -th entry of , it is calculated by the dot product of row (line node ) and row (line node ) of . If , then this entry will get (dot product of the same vector with two s). If , the result will be is line node and line node has no common vertex or hyperedge and be if they have either common vertex or hyperedge (the corresponding dimension gives and for other dimensions, summing to ). This is defined by Definition 3. In sum, is equal to the adjacency with 2-order self-loops, quantitatively,