UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks

05/03/2021 ∙ by Jing Huang, et al. ∙ Shanghai Jiao Tong University 19

Hypergraph, an expressive structure with flexibility to model the higher-order correlations among entities, has recently attracted increasing attention from various research domains. Despite the success of Graph Neural Networks (GNNs) for graph representation learning, how to adapt the powerful GNN-variants directly into hypergraphs remains a challenging problem. In this paper, we propose UniGNN, a unified framework for interpreting the message passing process in graph and hypergraph neural networks, which can generalize general GNN models into hypergraphs. In this framework, meticulously-designed architectures aiming to deepen GNNs can also be incorporated into hypergraphs with the least effort. Extensive experiments have been conducted to demonstrate the effectiveness of UniGNN on multiple real-world datasets, which outperform the state-of-the-art approaches with a large margin. Especially for the DBLP dataset, we increase the accuracy from 77.4% to 88.8% in the semi-supervised hypernode classification task. We further prove that the proposed message-passing based UniGNN models are at most as powerful as the 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs. Our code is available at <https://github.com/OneForward/UniGNN>.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Source code for the paper UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks (IJCAI 2021).

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hypergraphs are natural extensions of graphs by allowing an edge to join any number of vertices, which can represent the higher-order relationships involving multiple entities. Recently, hypergraphs have drawn the attention from a wide range of fields, like computer vision

[6, 12], recommendation system [19] and natural sciences [7], and been incorporated with various domain-specific tasks.

In a parallel note, Graph Representation Learning has raised a surge of interests from researchers. Numerous powerful Graph Neural Networks (GNNs) have been presented, achieving the state-of-the-art in specific graph-based tasks, such as node classification [4], link prediction [24] and graph classification [10]. Most GNNs are message-passing based models, like GCN [9], GAT [18], GIN [20] and GraphSAGE [8], which iteratively update node embeddings by aggregating neighboring nodes’ information. The expressive power of GNNs is well-known to be upper bounded by 1-Weisfeiler-Leman (1-WL) test [20] and many provably more powerful GNNs mimicking higher-order-WL test have been presented [13, 14, 11].

Furthermore, several works, like JKNet [21], DropEdge [15], DGN [25] and GCNII [4], have devoted substantial efforts to tackling the problem of over-smoothing, an issue when node embeddings in GNNs tend to converge as layers are stacked up and the performance downgrades significantly.

Despite the success of GNNs, how to learn powerful representative embeddings for hypergraphs remains a challenging problem. HGNN [5] is the first hypergraph neural network, which uses the clique expansion technique to approximate hypergraphs as graphs, and simplifies the problem above to the graph embedding framework. This approach, as illustrated in Fig 1, cannot cover the substructures like hyperedges which recursively contain other hyperedges are discarded with clique expansion. HyperGCN [22]

enhances the generalized hypergraph Laplacian with additional weighted pairwise edges (a.k.a mediators). This approach still fails to reserve complete hypergraph information since Graph Laplacian can only describe pairwise connections between vertices in one training epoch. Another work, HyperSAGE

[1] learns to embed hypergraphs directly by propagating messages in a two-stage procedure. Although HyperSAGE shows the capability to capture information from hypergraph structures with a giant leap in performance, it fails to adapt powerful classic GNN designs into hypergraphs.

In view of the fact that more and more meticulously-designed network architectures and learning strategies have appeared in graph learning, we are naturally motivated to ask the following intriguing question:

Can the network design and learning strategy for GNNs be applied to HyperGNNs directly?


This paper proposes the UniGNN, a unified framework for graph and hypergraph neural networks, with contributions unfolded by the following questions:

Q 1.

Can we generalize the well-designed GNN architecture for hypergraphs with the least effort?

Q 2.

Can we utilize the learning strategies for circumventing the over-smoothing in the graph learning and design deep neural networks that adapt to hypergraphs?

Q 3.

How powerful are hypergraph neural networks?

By addressing the above questions, we highlight our contributions as follows:

A 1.

We present the UniGNN and use it to generalize several classic GNNs, like GCN, GAT, GIN and GraphSAGE directly into hypergraphs, termed UniGCN, UniGAT, UniGIN and UniSAGE, respectively. UniGNNs consistently outperform the state-of-art approaches in hypergraph learning tasks.

A 2.

We propose the UniGCNII, the first deep hypergraph neural network and verify its effectiveness in resolving the over-smoothing issue.

A 3.

We prove that message-passing based UniGNNs are at most as powerful as 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs.

2 Preliminaries

2.1 Notations

Let denote a directed or undirected graph consisting of a vertex set and an edge set (pairs of vertices). A self-looped graph is constructed from by adding a self-loop to each of its non-self-looped nodes. The neighbor-nodes of vertex is denoted by . We also denote vertex ’s neighbor-nodes with itself as . We use to represent a -dimensional feature of vertex .

A hypergraph is defined as a generalized graph by allowing an edge to connect any number of vertices, where is a set of vertices and a hyperedge is a non-empty subset of . The incident-edges of vertex is denoted by . We say two hypergraphs and are isomorphic, written , if there exists a bijection such that .

2.2 Graph Neural Networks

General GNNs

Graph Neural Networks (GNNs) learn the informative embedding of a graph by utilizing the feature matrix and the graph structure. A broad range of GNNs can be built up by the message passing layers, in which node embeddings are updated by aggregating the information of its neighbor embeddings. The message passing process in the -th layer of a GNN is formulated as


Classic GNN models sharing this paradigm include GCN [9], GAT [18], GIN [20], GraphSAGE [8], etc..

In the following sections, we omit the superscript for the sake of simplicity and use to indicate the output of the message passing layer before activation or normalization.

2.3 HyperGraph Neural Networks

Spectral-based HyperGNNs

HGNN [5] and HyperConv [2] utilize the normalized hypergraph Laplacian, which essentially converts the hypergraphs to conventional graphs by viewing each hyperedge as a complete graph. HyperGCN [22] uses the generalized hypergraph Laplacian (changing between epochs) and injects information of mediators to represent hyperedges. Both methods depend on the hypergraph Laplacian, which however, emphasizes the pairwise relations between vertices. Another work, MPNN-R [23] regards hyperedges as new vertices with and represents the hypergraph by a matrix. MPNN-R can effectively capture the recursive property of hyperedges, but fails to describe other high-order relationships, like complex and diverse intersections between hyperedges.

Spatial-based HyperGNNs

A recent work, HyperSAGE [1] pioneers to exploit the structure of hypergraphs by aggregating messages in a two-stage procedure, avoiding the information loss due to the reduction of hypergraphs to graphs. With to denote vertex ’s intra-edge neighborhood for hyperedge , HyperSAGE aggregates information with the following rules:


where is a sampled subset of vertices from ,

is the linear transform and

and are power mean functions.

HyperSAGE is the current state-of-the-art algorithm for hypergraph representation learning. However, there are some issues associated with HyperSAGE. Firstly, since the calculation for is distinct for different pairs, the original algorithm uses nested loops over hyperedges and vertices within hyperedges, which results in redundant computation and poor parallelism. Secondly, applying the power mean functions in both stages, neither of which is injective, fails to distinguish structures with the same distribution but different multiplicities of elements [20]. Lastly, the original work still fails to address the over-smoothing issue associated with deep hypergraph neural networks.

3 UniGNN: a Unified Framework

To resolve the issues associated with HyperSAGE, we propose the UniGNN, a unified framework to characterize the message-passing process in GNNs and HyperGNNs:


where and are permutation-invariant functions for aggregating messages from vertices and hyperedges respectively. The update rule for UniGNN is illustrated in Fig 2. The key insight is that if we rethink Eq. (1) in GNNs as a two-stage aggregation process, then the designs for GNNs can be naturally generalized to hypergraphs in Eq. (3).

Figure 1: Toy examples of a graph and a hypergraph. (a) A graph . (b) A hypergraph . Note that can be reduced from using clique expansion.
Figure 2: An illustration of how the UniGNN can be applied to Graphs (left) and Hypergraphs (right). (a) Two-stage message passing for graph in Figure 1. (b) Two-stage message passing for hypergraph in Figure 1. Note that edges showing how messages flow to vertex 2 are marked in red.

In the first stage, for each hyperedge , we use to aggregate features of all vertices within it. can be any permutation-invariant function satisfying , such as the mean function or the sum function . It is obvious that if we let , then holds for any . Therefore, UniGNN (3) can be reduced to GNN (1), which unifies both formulations into the same framework.

In the second stage, we update each vertex with its incident hyperedges using aggregating function , of which the design can be inspired from existent GNNs directly. We will exhibit several effective examples in the following section.

3.1 Generalize Powerful GNNs for Hypergraphs


Graph Convolutional Networks (GCN) [9] propagate the features using the weighted sum (where weights are specified by the node degrees),


where .

Based on our framework, we can generalize the above aggregation process to hypergraphs as


where we define as the average degree of a hyperedge . UniGCN endorses less weight to high-degree hyperedges in aggregation. It is trivial that by letting , then , and thus UniGCN is reduced to GCN.


Graph Attention Networks (GAT) [18] adopt the attention mechanism to assign importance score to each of center node’s neighbors, leading to more effective aggregation. The attention mechanism is formulated as



is the leaky ReLU function,

is the learnable attentional parameter and means concatenation.

By rewriting the above equations, we can get UniGAT for hypergraphs as follows,


In this way, UniGAT learns to reweight the center node’s neighboring hyperedges.

UniGAT is essentially different from HyperGAT [2] since HyperGAT requires the hyperedges to be preprocessed into the same homogeneous domain of vertices before training, which is inflexible and unreliable in practice.

Note that based on the formulation for UniGCN and UniGAT, hypergraphs should be preprocessed with self-loops; that is, .


Graph Isomorphism Networks (GIN) [20] is a simple yet effective model with the expressive power achieving the upper bound of message passing based GNNs. GIN updates node embeddings as


where is a learnable parameter or some fixed scalar.

Similar to the previous deduction, UniGIN is formulated as



GraphSAGE [8]

uses a general aggregating function, like mean aggregator, LSTM aggregator or max-pooling aggregator, which can be designed according to various tasks. We use a variant of GraphSAGE where the combining process is

sum instead of concatenation following [1]:


UniSAGE is naturally generalized as


3.2 Towards Deep Hypergraph Neural Networks

Current hypergraph representation learning methods, like HGNN, HyperGCN and HyperSAGE, use a shallow network with two layers and the performance reduces significantly when layers are stacked up, which is in concordance with GNNs. This phenomenon is called over-smoothing. Although many works have focused on tackling this problem for graphs, like JKNet [21], DropEdge [15] and GCNII [4], how to make hypergraphs deeper is still uncovered.

Since in our framework, learning strategies from graph learning domain can be incorporated into hypergraphs with the least effort, we solve this problem by presenting UniGCNII, a deep hypergraph neural network inspired from GCNII.


GCNII [4] is a powerful deep graph convolutional network enhanced with

Initial Residual Connection

and Identity Mapping to vanquish the over-smoothing problem. We generalize GCNII to hypergraphs, dubbed UniGCNII, with the aggregation process defined as


where and

are hyperparameters,

is identity matrix and

is the initial feature of vertex .

In each layer, UniGCNII employs the same two-stage aggregation as UniGCN to exploit the hypergraph structure, and then injects the jumping knowledge from the initial features and previous features. Experiments validate that UniGCNII enjoys the advantage of circumventing the over-smoothing issue when models are getting deeper.

4 How Powerful are UniGNNs?

Message-passing based GNNs are capable of distinguishing local-substructure (like -height subtree rooted at a node) or global structure of graphs, with the expressive power upper bounded by 1-WL test. In view of this, we are motivated to investigate the expressive power of UniGNNs for hypergraphs. We start by presenting a variant of the 1-dimensional Generalized Weisfeiler-Leman Algorithm (1-GWL) for hypergraph isomorphism test following the work of [3].

4.1 Generalized Weisfeiler-Leman Algorithm

1-GWL sets up by labeling the vertices of a hypergraph with for any , and in the -th iteration the labels are updated by


where denotes the label of a hyperedge , and denotes a multiset.

1-GWL distinguish and as non-isomorphic if there exists a such that


where the subscript and are added for discrimination.

Proposition 1 (1-Gwl).

If 1-GWL test decides and are non-isomorphic, then .

We leave all the proofs in the supplemental files.

4.2 Discriminative Power of UniGNNs

We assign the same features to all vertices of a hypergraph so that the UniGNN only depends on the hypergraph structure to learn. Let be a UniGNN abiding by the aggregation rule (3), the following proposition indicates that ’s expressive power is upper bounded by 1-GWL test.

Proposition 2.

Given two non-isomorphic hypergraphs and , if can distinguish them by , then 1-GWL test also decides .

The following theorem characterizes the conditions for UniGNNs to reach the expressive power of 1-GWL test.

Theorem 1.

Given two hypergraphs and such that 1-GWL test decides as non-isomorphic, a UniGNN is suffice to distinguish them by with the following conditions:

  1. Local Level. Two-stage aggregating functions and are both injective.

  2. Global Level. In addition to the local-level conditions, ’s graph-level READOUT function is injective.

We are also interested in UniGNNs’ capability of distinguishing local substructures of hypergraphs. We define the local substructure of a hypergraph as the -height subtree of its incidence graph , where is the bipartite graph with vertices and edges .

Corollary 1.

Assume that 1-GWL test can distinguish two distinct local substructures from hypergraphs, the UniGNN can also distinguish them as long as the Local Level condition is satisfied.

5 Experiments

In this section, we evaluate the performance of the proposed methods in extensive experiments.


We use the standard academic network datasets: DBLP [16]

, Pubmed, Citeseer and Cora

[17] for all the experiments. The hypergraph is created with each vertex representing a document. The co-authorship hypergraphs, constructed from DBLP and Cora, connect all documents co-authored by one author as one hyperedge. The co-citation hypergraphs are built with PubMed, Citeseer and Cora, using one hyperedge to represent all documents cited by an author. We use the same preprocessed hypergraphs as HyperGCN, which are publicly available in their official implementation111https://github.com/malllabiisc/HyperGCN.

Co-authorship Data Co-citation Data
Method DBLP Cora Pubmed Citeseer Cora
MLP+HLR 63.6 ± 4.7 59.8 ± 4.7 64.7 ± 3.1 56.1 ± 2.6 61.0 ± 4.1
HGNN 69.2 ± 5.1 63.2 ± 3.1 66.8 ± 3.7 56.7 ± 3.8 70.0 ± 2.9
FastHyperGCN 68.1 ± 9.6 61.1 ± 8.2 65.7 ± 11.1 56.2 ± 8.1 61.3 ± 10.3
HyperGCN 70.9 ± 8.3 63.9 ± 7.3 68.3 ± 9.5 57.3 ± 7.3 62.5 ± 9.7
HyperSAGE 77.4 ± 3.8 72.4 ± 1.6 72.9 ± 1.3 61.8 ± 2.3 69.3 ± 2.7
UniGAT 88.7 ± 0.2 75.0 ± 1.1 74.7 ± 1.2 63.8 ± 1.6 69.2 ± 2.9
UniGCN 88.8 ± 0.2 75.3 ± 1.2 74.4 ± 1.0 63.6 ± 1.3 70.1 ± 1.4
UniGIN 88.6 ± 0.3 74.8 ± 1.3 74.4 ± 1.1 63.3 ± 1.2 69.2 ± 1.5
UniSAGE 88.5 ± 0.2 75.1 ± 1.2 74.3 ± 1.0 63.8 ± 1.3 70.2 ± 1.5
Table 1: Testing accuracy (%) of UniGNNs and other hypergraph models on co-authorship and co-citation datasets for Semi-supervised Hypernode Classification. The best or competitive results are highlighted for each dataset.
DBLP Pubmed Citeseer Cora(cocitation)
Method seen unseen seen unseen seen unseen seen unseen
MLP+HLR 64.5 58.7 66.8 62.4 60.1 58.2 65.7 64.2
HyperSAGE 78.1 73.2 81.0 80.4 69.3 67.9 71.3 66.8
UniGAT 88.4 82.7 83.5 83.4 70.9 71.3 72.4 70.1
UniGCN 88.5 82.6 83.7 83.3 71.2 70.6 74.3 71.5
UniGCN* 88.4 82.8 85.7 85.1 68.2 70.6 74.1 71.8
UniGIN 89.6 83.4 83.8 83.3 71.5 70.8 73.7 71.3
UniSAGE 89.3 83.0 83.6 83.1 71.1 70.8 74.2 71.5
Table 2: Testing accuracies(%) of multiple UniGNN variants and other hypergraph learning methods for Inductive Learning on Evolving Hypergraphs. The best or competitive results are highlighted for each dataset.
Dataset Models Layers
2 4 8 16 32 64
DBLP UniGAT 89.1 66.4 21.2 16.2 16.2 OOM
UniGCN 89.2 79.2 18.6 16.2 16.2 16.2
UniGIN 89.6 88.3 47.9 26.6 23.1 16.3
UniSAGE 89.4 88.2 46.7 31.0 20.6 16.2
UniGCNII 88.4 87.6 88.4 89.3 89.3 89.4
Cora Coauthor UniGAT 76.0 64.0 31.7 29.4 29.1 30.4
UniGCN 76.2 68.7 38.2 28.7 29.2 29.4
UniGIN 75.8 68.3 39.0 28.3 28.5 30.2
UniSAGE 75.9 68.7 37.4 28.9 28.3 28.6
UniGCNII 75.1 74.2 75.1 76.1 76.6 76.5
Pubmed UniGAT 75.2 68.8 61.9 55.8 41.1 39.7
UniGCN 74.9 73.7 61.2 49.5 41.7 39.8
UniGIN 74.8 73.6 60.6 49.7 41.6 40.3
UniSAGE 74.8 73.3 61.6 50.2 41.5 39.7
UniGCNII 75.6 75.8 75.8 75.4 75.4 75.4
Citeseer UniGAT 65.4 51.9 33.9 27.2 21.3 19.9
UniGCN 64.5 58.7 35.5 23.3 21.0 20.2
UniGIN 64.6 59.3 36.9 27.0 21.9 20.0
UniSAGE 65.0 59.0 36.6 26.8 21.4 20.6
UniGCNII 64.1 63.3 63.9 65.8 66.4 66.5
Cora Cocitation UniGAT 70.9 55.4 30.7 27.9 23.6 27.3
UniGCN 71.2 62.9 31.5 25.3 26.7 27.2
UniGIN 70.9 62.7 35.1 26.9 28.0 27.2
UniSAGE 71.4 63.3 33.6 26.6 26.1 27.2
UniGCNII 70.0 70.4 72.3 73.4 73.6 73.3
Table 3: Testing accuracies(%) of multiple UniGNN variants with different depths in Semi-supervised Hypernode Classification task. The result of the best performed model for each dataset is bolded. Note that additional validation data are used in this experiment.
w/o w/ w/o w/
DBLP 88.1 ± 0.1 88.7 ± 0.2 88.1 ± 0.1 88.8 ± 0.2
Cora 1 67.4 ± 1.5 75.0 ± 1.1 67.3 ± 2.0 75.3 ± 1.2
Pubmed 30.1 ± 0.8 74.7 ± 1.2 30.2 ± 0.9 74.4 ± 1.0
Citeseer 39.8 ± 1.2 63.8 ± 1.6 40.2 ± 1.3 63.6 ± 1.3
Cora 2 43.8 ± 3.9 69.2 ± 2.9 44.1 ± 3.6 70.1 ± 1.4
Table 4: Testing accuracies(%) of UniGCN and UniGAT in Semi-supervised Hypernode Classification task, when input hypergraphs are with or without self-loops. Cora 1 is for Cora coauthorship and Cora 2 is for Cora cocitation.

5.1 Semi-supervised Hypernode Classification

Setting Up and Baselines.

The semi-supervised hypernode classification task aims to predict labels for the test nodes, given the hypergraph structure, all nodes’ features and very limited training labels. The label rate of each dataset can be found in the supplemental materials.

We employ four two-layer UniGNN variants: UniGCN, UniGAT, UniGIN and UniSAGE. For all models, mean function is used as the first-stage aggregation. UniSAGE uses the SUM function for the second-stage aggregation. Note that as described in Section 3.1, hypergraphs are preprocessed with self-loops for UniGCN and UniGAT.

We compare UniGNN models against the following baselines: (a) Multi-Layer Perceptron with explicit Hypergraph Laplacian Regularization (

MLP+HLR), (b) HyperGraph Neural Network (HGNN [5]), (c) HyperGraph Convolutional Network (HyperGCN [22]) and (d) HyperSAGE [1].

Closely following the previous works, for each model on each dataset, we repeat experiments over 10 data splits with 8 different random seeds, amounting to 80 experiments. We use the Adam optimizer with a learning rate of 0.01 and the weight decay of 0.0005. We fix the training epochs as 200 and report the performance of the model of the last epoch. The same training/testing split as [22] is used. We run all experiments on a single NVIDIA 1080Ti(11GB).

Comparison to SOTAs.

Table 1

summarizes the mean classification accuracy with the standard deviation on on the test split of UniGNN variants after 80 runs. We reuse the metrics that are already reported in

[1] for MLP+HLR, HGNN, HyperGCN and the best metrics reported for HyperSAGE.

Results in Table 1

demonstrate that UniGNNs are consistently better than the baselines with a considerable lift, achieving a new state-of-the-art. Especially for the DBLP dataset, we significantly improve the accuracy from 77.4% to 88.8% with negligible variance. On all datasets, our results are generally more stable than the baselines, as indicated by lower standard deviation. Whereas for the Cora cocitation dataset, we report only slight improvements against SOTA, we argue that this is due to the fact that the Cora cocitation hypergraph contains the least mean hyperedge size

, for which the information loss from clique expansion in HGNN might be negligible.

Overall, with the powerful aggregation designs inspired from GNNs, our UniGNN models can effectively capture the intrinsic structure information from hypergraphs and perform stably better prediction with less deviation.

Effect of Self-loops for UniGCN and UniGAT

We further study the effect of self-loops for UniGCN and UniGAT. Table 4 reports the mean accuracies for UniGCN and UniGAT when input hypergraphs are with or without self-loops. We observe that when hypergraphs are un-self-looped, the performances drop significantly for most datasets, which support the correctness of the formulations in Section 3.1.

5.2 Inductive Learning on Evolving Hypergraphs

Setting Up

The task for inductive learning on evolving hypergraph takes the historical hypergraph as input and predicts the unseen nodes’ labels.

We closely follow [1]

and use the corrupted hypergraph which randomly removes 40% vertices as unseen data during training. 20% vertices are used for training and the rest 40% for the seen part of testing vertices. The other experimental settings are similar to those in the transductive semi-supervised learning task. We employ an additional UniGCN variant, denoted as UniGCN*, which applies the linear transform after aggregation and normalization. We compare our models against MLP+HLR and HyperSAGE and use the best results reported from


Comparison to SOTAs.

Table 2 reports the mean classification accuracy on seen part and unseen part of the testing data. We observe that our UniGNN models consistently show better scores across the benchmark datasets. Similar to the semi-supervised setting, our models notably show significant improvements in dataset DBLP, where the prediction accuracy increases from 78.1% to 89.6% in the seen data and from 73.2% to 83.4% in the unseen data.

Results from table 2 confirm that UniGNNs can capture the global structure information and perform well for predicting unseen nodes in the inductive learning task.

5.3 Performance of Deep-layered UniGNNs

Setting Up.

To verify the effectiveness of UniGCNII, we study how the performance changes for vanilla UniGNN models with various depths. In this experiment, we use the same setting as described in the semi-supervised hypernode classification task, except that additional 20% of the original testing split is used as the validation split.

For UniGCNII, we perform all experiments in 1000 epochs and early stopping with a patience of 150 epochs. We use the Adam Optimizer with a learning rate of 0.01. We set the L2 regularizer factor to 0.01 for the convolutional layers, 0.0005 for the dense layer, which is the same as described in GCNII [4]. Please refer to the supplemental materials for more details.

Comparison with Other Deep-layered UniGNNs.

Table 3 summarizes the results, in which the best performed model for each dataset is bolded. We see that UniGCNII enjoys the benefit of deep network structures and shows generally better results as layers increase. We highlight that UniGCNII outperforms the best shallow models in dataset Cora, Pubmed and Citeseer, and obtains competitive results in dataset DBLP. On the contrary, the performance of vanilla models drop significantly as depths increase.

Overall, the results suggest that our proposed framework is capable of incorporating meticulously-designed deep GNN models for deep hypergraph learning.

6 Conclusion

We propose the UniGNN, a unified framework for graph and hypergraph neural networks. Under this framework, we naturally generalize several classic GNNs to HyperGNNs, which consistently show stably better performances than recent state-of-the-art methods. We firstly solve the over-smoothing problem of deep hypergraph neural networks by presenting the UniGCNII. Our models learn expressive representation of hypergraphs, which can be beneficial for a broad range of downstream tasks. Future works include designing provably more powerful UniGNNs with high-order GWL test. Another interesting direction for future work is to design hypergraph subtree kernel for hypergraph classification.


  • [1] D. Arya, D. K. Gupta, S. Rudinac, and M. Worring (2020) HyperSAGE: Generalizing Inductive Representation Learning on Hypergraphs. Note: https://openreview.net/forum?id=cKnKJcTPRcV External Links: 2010.04558, Link Cited by: §1, §2.3, §3.1, §5.1, §5.1, §5.2.
  • [2] S. Bai, F. Zhang, and P. H. Torr (2019) Hypergraph convolution and hypergraph attention. arXiv preprint arXiv:1901.08150. Cited by: §2.3, §3.1.
  • [3] J. Böker (2019) Color refinement, homomorphisms, and hypergraphs. In International Workshop on Graph-Theoretic Concepts in Computer Science, pp. 338–350. Cited by: §4.
  • [4] M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li (2020) Simple and Deep Graph Convolutional Networks.

    International Conference on Machine Learning

    External Links: 2007.02133 Cited by: §1, §1, §3.2, §3.2, §5.3.
  • [5] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao (2019) Hypergraph neural networks. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 3558–3565. Cited by: §1, §2.3, §5.1.
  • [6] Y. Gao, Z. Zhang, H. Lin, X. Zhao, S. Du, and C. Zou (2020) Hypergraph learning: methods and practices. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1.
  • [7] X. Gu, L. Chen, and M. Krenn (2020) Quantum experiments and hypergraphs: multiphoton sources for quantum interference, quantum computation, and quantum entanglement. Physical Review A. External Links: ISSN 2469-9934, Link, Document Cited by: §1.
  • [8] W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive Representation Learning on Large Graphs. NeurIPS. External Links: 1706.02216, Link Cited by: §1, §2.2, §3.1.
  • [9] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations. External Links: 1609.02907 Cited by: §1, §2.2, §3.1.
  • [10] M. Li, S. Chen, Y. Zhang, and I. W. Tsang (2020) Graph Cross Networks with Vertex Infomax Pooling. NeurIPS. External Links: 2010.01804, Link Cited by: §1.
  • [11] P. Li, Y. Wang, H. Wang, and J. Leskovec (2020) Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. NeurIPS. External Links: 2009.00142, Link Cited by: §1.
  • [12] S. Liu, P. Lv, Y. Zhang, J. Fu, J. Cheng, W. Li, B. Zhou, and M. Xu (2020)

    Semi-dynamic hypergraph neural network for 3d pose estimation

    In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, C. Bessiere (Ed.), Note: Main track External Links: Document, Link Cited by: §1.
  • [13] C. Morris, G. Rattan, and P. Mutzel (2019) Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings. International Conference on Machine Learning. External Links: 1904.01543, Link Cited by: §1.
  • [14] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe (2019) Weisfeiler and leman go neural: higher-order graph neural networks. In Proceedings of AAAI Conference on ArtificialInteligence, External Links: Link, Document Cited by: §1.
  • [15] Y. Rong, W. Huang, T. Xu, and J. Huang (2019) FSL on Graphs via superclassed based on graph spectral measuresFSL on Graphs via superclassed based on graph spectral measures. International Conference on Learning Representations. External Links: arXiv:1907.10903v4 Cited by: §1, §3.2.
  • [16] R. Rossi and N. Ahmed (2015) The network data repository with interactive graph analytics and visualization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29. Cited by: §5.
  • [17] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad (2008) Collective classification in network data. AI Magazine. External Links: Link Cited by: §5.
  • [18] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2017) Graph Attention Networks. International Conference on Learning Representations. External Links: 1710.10903, Link Cited by: §1, §2.2, §3.1.
  • [19] X. Xia, H. Yin, J. Yu, Q. Wang, L. Cui, and X. Zhang (2021) Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the Thirty-Fifth Conference on Association for the Advancement of Artificial Intelligence (AAAI), Cited by: §1.
  • [20] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2018) How Powerful are Graph Neural Networks?. International Conference on Learning Representations. External Links: 1810.00826, Link Cited by: §1, §2.2, §2.3, §3.1.
  • [21] K. Xu, C. Li, Y. Tian, T. Sonobe, K. I. Kawarabayashi, and S. Jegelka (2018) Representation learning on graphs with jumping knowledge networks. 35th International Conference on Machine Learning. External Links: 1806.03536, ISBN 9781510867963 Cited by: §1, §3.2.
  • [22] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar (2019) HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs. NeurIPS. External Links: 1809.02589, Link Cited by: §1, §2.3, §5.1, §5.1.
  • [23] N. Yadati (2020) Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs. NeurIPS. Cited by: §2.3.
  • [24] M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. NeurIPS. External Links: 1802.09691, ISSN 10495258 Cited by: §1.
  • [25] K. Zhou, X. Huang, Y. Li, D. Zha, R. Chen, and X. Hu (2020) Towards Deeper Graph Neural Networks with Differentiable Group Normalization. NeurIPS. External Links: 2006.06972 Cited by: §1.

Appendix A Proofs

a.1 Proof for Proposition 1


We use to denote the labels for in iteration and use to denote the incident hyperedges of vertex in .

We prove this proposition by contrapositive. Suppose , then there exist a bijection and a bijection such that .

Since implies , in iteration 0, we have

Assume that in iteration , holds. Then in iteration , we have

Note that is bijective; thus, we have .

By induction rules, this proves that for any iteration , holds and thereby the proposition. ∎

a.2 Proof for Proposition 2

We use to denote the feature of vertex in iteration and use to denote the multiset of all vertices’ features in hypergraph .

Lemma 1.

Let denote the set . If we have


then , we have

Proof of Lemma.

In iteration , we do not need the second equation in (19). The lemma holds since 1-GWL test and UniGNN start with the same vertex labels/features.

Assume that and , holds. If in iteration , we also have , that is,

Based on our assumption, we must have

Note that UniGNN witnesses the same neighborhood for vertex and vertex ; thus, two-stage aggregating functions generate the same output for them, i.e. . Therefore, the lemma holds. ∎

Proof of Proposition 2.

Suppose that there exists a minimal such that after iterations, the UniGNN has , i.e. , but the 1-GWL test can not decide and as non-isomorphic, i.e. .

If , then implies since 1-GWL test and UniGNN start with the same vertex labels/features. This is contradictive to our assumption.

If , we have and for . By assumption , then the conditions for the lemma hold. Thus, we have

which indicates there exists a valid mapping such that . Since , then

Therefore, we have

This results in the contradiction against our assumption and thereby we prove the original proposition.

a.3 Proof for Theorem 1


We start by proving that there exists an injective mapping such that .

If , it’s trivial to prove that exists since and are both initialized with the same labels/features.

Assume that and the injective mapping exists, then for iteration ,

Since the composition of injective functions is still injective, we can rewrite it as

where injective function is induced from and injective function is induced by .

Therefore, by induction rule, there always exists an injective mapping such that .

Thus, if in iteration , , then

Therefore, by injectivity, we have

Since the global READOUT function is also injective, this implies that if 1-GWL test decides and as non-isomorphic, UniGNN can also distinguish them by . ∎

a.4 Proof for Corollary 1


We can remap the local substructure from incident graph to a subhypergraph and repeat the proof on the subhypergraph as described above. Since we don’t need a global READOUT, the Local Level conditions are sufficient. ∎

Appendix B Implementation Details

b.1 Hyper-parameters Details

Table 5 summarizes the model hyper-parameters and optimizers for all experiments. represents the weight decay for shallow-UniGNNs. and denote the weight decay for dense layer and convolutional layer of UniGCNII respectively. For DBLP dataset, we remove the normalization layer for each model.

Method Hyper-parameters
UniGAT hidden: 8, heads: 8, dropout: 0.6,
lr: 0.01, : 5e-4 , attentional dropout: 0.6
UniGCN hidden: 64, dropout: 0.6,
lr: 0.01, : 5e-4
UniGIN hidden: 64, dropout: 0.6,
lr: 0.01, : 5e-4
UniSAGE hidden: 64, dropout: 0.6,
lr: 0.01, : 5e-4, AGGREGATOR:SUM
UniGCNII hidden: 64, dropout: 0.2,
lr: 0.01, : 0.01, : 5e-4
Table 5: The model hyper-parameters for all experiments.

b.2 Experimental Details

Semi-supervised Hypernode Classification

We train each model in each dataset to 200 epochs and report the performance of the model of the last epoch. The experiment of each model for each dataset is conducted 80 times with 10 train/test splits and 8 different random seeds.

Inductive Learning on Evolving Hypergraphs

We train each model in each dataset with 200 epochs and report the performance of the model of the last epoch. For PubMed dataset, we set the input dropout as 0 instead of 0.6 and run 300 epochs. The experiment of each model for each dataset is conducted 80 times with 10 train/test splits and 8 different random seeds.

Performance of Deep-layerd UniGNNs

In this experiment, additional 20% of the original testing split is used as the validation split and the rest 80% is used as the new testing split. For the shallow UniGNNs, we train each model in each dataset in 200 epochs and report the performance of the model with the highest validation score. For UniGCNII, we perform all experiments in 1000 epochs and early stopping with a patience of 150 epochs. The experiment of each model for each dataset is conducted 80 times with 10 train/val/test splits and 8 different random seeds.

Appendix C Details of Datasets

DBLP 43413 22535 4.7 ± 6.1 1425 6 4.0%
PubMed 19717 7963 4.3 ± 5.7 500 3 0.8%
Citeseer 3312 1079 3.2 ± 2.0 3703 6 4.2%
Cora 1 2708 1072 4.2 ± 4.1 1433 7 5.2%
Cora 2 2708 1579 3.0 ± 1.1 1433 7 5.2%
Table 6: Dataset Statistics. denotes the average size of hyperedges. , and denotes the dimension of input features, the number of classes and label rate, respectively. Cora 1 is for Cora coauthorship and Cora 2 is for Cora cocitation.

Statistics of the datasets are summarized in Table 6.