Source code for the paper UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks (IJCAI 2021).
Hypergraph, an expressive structure with flexibility to model the higher-order correlations among entities, has recently attracted increasing attention from various research domains. Despite the success of Graph Neural Networks (GNNs) for graph representation learning, how to adapt the powerful GNN-variants directly into hypergraphs remains a challenging problem. In this paper, we propose UniGNN, a unified framework for interpreting the message passing process in graph and hypergraph neural networks, which can generalize general GNN models into hypergraphs. In this framework, meticulously-designed architectures aiming to deepen GNNs can also be incorporated into hypergraphs with the least effort. Extensive experiments have been conducted to demonstrate the effectiveness of UniGNN on multiple real-world datasets, which outperform the state-of-the-art approaches with a large margin. Especially for the DBLP dataset, we increase the accuracy from 77.4% to 88.8% in the semi-supervised hypernode classification task. We further prove that the proposed message-passing based UniGNN models are at most as powerful as the 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs. Our code is available at <https://github.com/OneForward/UniGNN>.READ FULL TEXT VIEW PDF
Source code for the paper UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks (IJCAI 2021).
Hypergraphs are natural extensions of graphs by allowing an edge to join any number of vertices, which can represent the higher-order relationships involving multiple entities. Recently, hypergraphs have drawn the attention from a wide range of fields, like computer vision[6, 12], recommendation system  and natural sciences , and been incorporated with various domain-specific tasks.
In a parallel note, Graph Representation Learning has raised a surge of interests from researchers. Numerous powerful Graph Neural Networks (GNNs) have been presented, achieving the state-of-the-art in specific graph-based tasks, such as node classification , link prediction  and graph classification . Most GNNs are message-passing based models, like GCN , GAT , GIN  and GraphSAGE , which iteratively update node embeddings by aggregating neighboring nodes’ information. The expressive power of GNNs is well-known to be upper bounded by 1-Weisfeiler-Leman (1-WL) test  and many provably more powerful GNNs mimicking higher-order-WL test have been presented [13, 14, 11].
Furthermore, several works, like JKNet , DropEdge , DGN  and GCNII , have devoted substantial efforts to tackling the problem of over-smoothing, an issue when node embeddings in GNNs tend to converge as layers are stacked up and the performance downgrades significantly.
Despite the success of GNNs, how to learn powerful representative embeddings for hypergraphs remains a challenging problem. HGNN  is the first hypergraph neural network, which uses the clique expansion technique to approximate hypergraphs as graphs, and simplifies the problem above to the graph embedding framework. This approach, as illustrated in Fig 1, cannot cover the substructures like hyperedges which recursively contain other hyperedges are discarded with clique expansion. HyperGCN 
enhances the generalized hypergraph Laplacian with additional weighted pairwise edges (a.k.a mediators). This approach still fails to reserve complete hypergraph information since Graph Laplacian can only describe pairwise connections between vertices in one training epoch. Another work, HyperSAGE learns to embed hypergraphs directly by propagating messages in a two-stage procedure. Although HyperSAGE shows the capability to capture information from hypergraph structures with a giant leap in performance, it fails to adapt powerful classic GNN designs into hypergraphs.
In view of the fact that more and more meticulously-designed network architectures and learning strategies have appeared in graph learning, we are naturally motivated to ask the following intriguing question:
Can the network design and learning strategy for GNNs be applied to HyperGNNs directly?
This paper proposes the UniGNN, a unified framework for graph and hypergraph neural networks, with contributions unfolded by the following questions:
Can we generalize the well-designed GNN architecture for hypergraphs with the least effort?
Can we utilize the learning strategies for circumventing the over-smoothing in the graph learning and design deep neural networks that adapt to hypergraphs?
How powerful are hypergraph neural networks?
By addressing the above questions, we highlight our contributions as follows:
We present the UniGNN and use it to generalize several classic GNNs, like GCN, GAT, GIN and GraphSAGE directly into hypergraphs, termed UniGCN, UniGAT, UniGIN and UniSAGE, respectively. UniGNNs consistently outperform the state-of-art approaches in hypergraph learning tasks.
We propose the UniGCNII, the first deep hypergraph neural network and verify its effectiveness in resolving the over-smoothing issue.
We prove that message-passing based UniGNNs are at most as powerful as 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs.
Let denote a directed or undirected graph consisting of a vertex set and an edge set (pairs of vertices). A self-looped graph is constructed from by adding a self-loop to each of its non-self-looped nodes. The neighbor-nodes of vertex is denoted by . We also denote vertex ’s neighbor-nodes with itself as . We use to represent a -dimensional feature of vertex .
A hypergraph is defined as a generalized graph by allowing an edge to connect any number of vertices, where is a set of vertices and a hyperedge is a non-empty subset of . The incident-edges of vertex is denoted by . We say two hypergraphs and are isomorphic, written , if there exists a bijection such that .
Graph Neural Networks (GNNs) learn the informative embedding of a graph by utilizing the feature matrix and the graph structure. A broad range of GNNs can be built up by the message passing layers, in which node embeddings are updated by aggregating the information of its neighbor embeddings. The message passing process in the -th layer of a GNN is formulated as
In the following sections, we omit the superscript for the sake of simplicity and use to indicate the output of the message passing layer before activation or normalization.
HGNN  and HyperConv  utilize the normalized hypergraph Laplacian, which essentially converts the hypergraphs to conventional graphs by viewing each hyperedge as a complete graph. HyperGCN  uses the generalized hypergraph Laplacian (changing between epochs) and injects information of mediators to represent hyperedges. Both methods depend on the hypergraph Laplacian, which however, emphasizes the pairwise relations between vertices. Another work, MPNN-R  regards hyperedges as new vertices with and represents the hypergraph by a matrix. MPNN-R can effectively capture the recursive property of hyperedges, but fails to describe other high-order relationships, like complex and diverse intersections between hyperedges.
A recent work, HyperSAGE  pioneers to exploit the structure of hypergraphs by aggregating messages in a two-stage procedure, avoiding the information loss due to the reduction of hypergraphs to graphs. With to denote vertex ’s intra-edge neighborhood for hyperedge , HyperSAGE aggregates information with the following rules:
where is a sampled subset of vertices from ,
is the linear transform andand are power mean functions.
HyperSAGE is the current state-of-the-art algorithm for hypergraph representation learning. However, there are some issues associated with HyperSAGE. Firstly, since the calculation for is distinct for different pairs, the original algorithm uses nested loops over hyperedges and vertices within hyperedges, which results in redundant computation and poor parallelism. Secondly, applying the power mean functions in both stages, neither of which is injective, fails to distinguish structures with the same distribution but different multiplicities of elements . Lastly, the original work still fails to address the over-smoothing issue associated with deep hypergraph neural networks.
To resolve the issues associated with HyperSAGE, we propose the UniGNN, a unified framework to characterize the message-passing process in GNNs and HyperGNNs:
where and are permutation-invariant functions for aggregating messages from vertices and hyperedges respectively. The update rule for UniGNN is illustrated in Fig 2. The key insight is that if we rethink Eq. (1) in GNNs as a two-stage aggregation process, then the designs for GNNs can be naturally generalized to hypergraphs in Eq. (3).
In the first stage, for each hyperedge , we use to aggregate features of all vertices within it. can be any permutation-invariant function satisfying , such as the mean function or the sum function . It is obvious that if we let , then holds for any . Therefore, UniGNN (3) can be reduced to GNN (1), which unifies both formulations into the same framework.
In the second stage, we update each vertex with its incident hyperedges using aggregating function , of which the design can be inspired from existent GNNs directly. We will exhibit several effective examples in the following section.
Graph Convolutional Networks (GCN)  propagate the features using the weighted sum (where weights are specified by the node degrees),
Based on our framework, we can generalize the above aggregation process to hypergraphs as
where we define as the average degree of a hyperedge . UniGCN endorses less weight to high-degree hyperedges in aggregation. It is trivial that by letting , then , and thus UniGCN is reduced to GCN.
Graph Attention Networks (GAT)  adopt the attention mechanism to assign importance score to each of center node’s neighbors, leading to more effective aggregation. The attention mechanism is formulated as
is the leaky ReLU function,is the learnable attentional parameter and means concatenation.
By rewriting the above equations, we can get UniGAT for hypergraphs as follows,
In this way, UniGAT learns to reweight the center node’s neighboring hyperedges.
UniGAT is essentially different from HyperGAT  since HyperGAT requires the hyperedges to be preprocessed into the same homogeneous domain of vertices before training, which is inflexible and unreliable in practice.
Note that based on the formulation for UniGCN and UniGAT, hypergraphs should be preprocessed with self-loops; that is, .
Graph Isomorphism Networks (GIN)  is a simple yet effective model with the expressive power achieving the upper bound of message passing based GNNs. GIN updates node embeddings as
where is a learnable parameter or some fixed scalar.
Similar to the previous deduction, UniGIN is formulated as
uses a general aggregating function, like mean aggregator, LSTM aggregator or max-pooling aggregator, which can be designed according to various tasks. We use a variant of GraphSAGE where the combining process issum instead of concatenation following :
UniSAGE is naturally generalized as
Current hypergraph representation learning methods, like HGNN, HyperGCN and HyperSAGE, use a shallow network with two layers and the performance reduces significantly when layers are stacked up, which is in concordance with GNNs. This phenomenon is called over-smoothing. Although many works have focused on tackling this problem for graphs, like JKNet , DropEdge  and GCNII , how to make hypergraphs deeper is still uncovered.
Since in our framework, learning strategies from graph learning domain can be incorporated into hypergraphs with the least effort, we solve this problem by presenting UniGCNII, a deep hypergraph neural network inspired from GCNII.
GCNII  is a powerful deep graph convolutional network enhanced with Initial Residual Connection
Initial Residual Connectionand Identity Mapping to vanquish the over-smoothing problem. We generalize GCNII to hypergraphs, dubbed UniGCNII, with the aggregation process defined as
is identity matrix andis the initial feature of vertex .
In each layer, UniGCNII employs the same two-stage aggregation as UniGCN to exploit the hypergraph structure, and then injects the jumping knowledge from the initial features and previous features. Experiments validate that UniGCNII enjoys the advantage of circumventing the over-smoothing issue when models are getting deeper.
Message-passing based GNNs are capable of distinguishing local-substructure (like -height subtree rooted at a node) or global structure of graphs, with the expressive power upper bounded by 1-WL test. In view of this, we are motivated to investigate the expressive power of UniGNNs for hypergraphs. We start by presenting a variant of the 1-dimensional Generalized Weisfeiler-Leman Algorithm (1-GWL) for hypergraph isomorphism test following the work of .
1-GWL sets up by labeling the vertices of a hypergraph with for any , and in the -th iteration the labels are updated by
where denotes the label of a hyperedge , and denotes a multiset.
1-GWL distinguish and as non-isomorphic if there exists a such that
where the subscript and are added for discrimination.
If 1-GWL test decides and are non-isomorphic, then .
We leave all the proofs in the supplemental files.
We assign the same features to all vertices of a hypergraph so that the UniGNN only depends on the hypergraph structure to learn. Let be a UniGNN abiding by the aggregation rule (3), the following proposition indicates that ’s expressive power is upper bounded by 1-GWL test.
Given two non-isomorphic hypergraphs and , if can distinguish them by , then 1-GWL test also decides .
The following theorem characterizes the conditions for UniGNNs to reach the expressive power of 1-GWL test.
Given two hypergraphs and such that 1-GWL test decides as non-isomorphic, a UniGNN is suffice to distinguish them by with the following conditions:
Local Level. Two-stage aggregating functions and are both injective.
Global Level. In addition to the local-level conditions, ’s graph-level READOUT function is injective.
We are also interested in UniGNNs’ capability of distinguishing local substructures of hypergraphs. We define the local substructure of a hypergraph as the -height subtree of its incidence graph , where is the bipartite graph with vertices and edges .
Assume that 1-GWL test can distinguish two distinct local substructures from hypergraphs, the UniGNN can also distinguish them as long as the Local Level condition is satisfied.
In this section, we evaluate the performance of the proposed methods in extensive experiments.
We use the standard academic network datasets: DBLP 17] for all the experiments. The hypergraph is created with each vertex representing a document. The co-authorship hypergraphs, constructed from DBLP and Cora, connect all documents co-authored by one author as one hyperedge. The co-citation hypergraphs are built with PubMed, Citeseer and Cora, using one hyperedge to represent all documents cited by an author. We use the same preprocessed hypergraphs as HyperGCN, which are publicly available in their official implementation111https://github.com/malllabiisc/HyperGCN.
|Co-authorship Data||Co-citation Data|
|MLP+HLR||63.6 ± 4.7||59.8 ± 4.7||64.7 ± 3.1||56.1 ± 2.6||61.0 ± 4.1|
|HGNN||69.2 ± 5.1||63.2 ± 3.1||66.8 ± 3.7||56.7 ± 3.8||70.0 ± 2.9|
|FastHyperGCN||68.1 ± 9.6||61.1 ± 8.2||65.7 ± 11.1||56.2 ± 8.1||61.3 ± 10.3|
|HyperGCN||70.9 ± 8.3||63.9 ± 7.3||68.3 ± 9.5||57.3 ± 7.3||62.5 ± 9.7|
|HyperSAGE||77.4 ± 3.8||72.4 ± 1.6||72.9 ± 1.3||61.8 ± 2.3||69.3 ± 2.7|
|UniGAT||88.7 ± 0.2||75.0 ± 1.1||74.7 ± 1.2||63.8 ± 1.6||69.2 ± 2.9|
|UniGCN||88.8 ± 0.2||75.3 ± 1.2||74.4 ± 1.0||63.6 ± 1.3||70.1 ± 1.4|
|UniGIN||88.6 ± 0.3||74.8 ± 1.3||74.4 ± 1.1||63.3 ± 1.2||69.2 ± 1.5|
|UniSAGE||88.5 ± 0.2||75.1 ± 1.2||74.3 ± 1.0||63.8 ± 1.3||70.2 ± 1.5|
|DBLP||88.1 ± 0.1||88.7 ± 0.2||88.1 ± 0.1||88.8 ± 0.2|
|Cora 1||67.4 ± 1.5||75.0 ± 1.1||67.3 ± 2.0||75.3 ± 1.2|
|Pubmed||30.1 ± 0.8||74.7 ± 1.2||30.2 ± 0.9||74.4 ± 1.0|
|Citeseer||39.8 ± 1.2||63.8 ± 1.6||40.2 ± 1.3||63.6 ± 1.3|
|Cora 2||43.8 ± 3.9||69.2 ± 2.9||44.1 ± 3.6||70.1 ± 1.4|
The semi-supervised hypernode classification task aims to predict labels for the test nodes, given the hypergraph structure, all nodes’ features and very limited training labels. The label rate of each dataset can be found in the supplemental materials.
We employ four two-layer UniGNN variants: UniGCN, UniGAT, UniGIN and UniSAGE. For all models, mean function is used as the first-stage aggregation. UniSAGE uses the SUM function for the second-stage aggregation. Note that as described in Section 3.1, hypergraphs are preprocessed with self-loops for UniGCN and UniGAT.
We compare UniGNN models against the following baselines: (a) Multi-Layer Perceptron with explicit Hypergraph Laplacian Regularization (MLP+HLR), (b) HyperGraph Neural Network (HGNN ), (c) HyperGraph Convolutional Network (HyperGCN ) and (d) HyperSAGE .
Closely following the previous works, for each model on each dataset, we repeat experiments over 10 data splits with 8 different random seeds, amounting to 80 experiments. We use the Adam optimizer with a learning rate of 0.01 and the weight decay of 0.0005. We fix the training epochs as 200 and report the performance of the model of the last epoch. The same training/testing split as  is used. We run all experiments on a single NVIDIA 1080Ti(11GB).
summarizes the mean classification accuracy with the standard deviation on on the test split of UniGNN variants after 80 runs. We reuse the metrics that are already reported in for MLP+HLR, HGNN, HyperGCN and the best metrics reported for HyperSAGE.
Results in Table 1
demonstrate that UniGNNs are consistently better than the baselines with a considerable lift, achieving a new state-of-the-art. Especially for the DBLP dataset, we significantly improve the accuracy from 77.4% to 88.8% with negligible variance. On all datasets, our results are generally more stable than the baselines, as indicated by lower standard deviation. Whereas for the Cora cocitation dataset, we report only slight improvements against SOTA, we argue that this is due to the fact that the Cora cocitation hypergraph contains the least mean hyperedge size, for which the information loss from clique expansion in HGNN might be negligible.
Overall, with the powerful aggregation designs inspired from GNNs, our UniGNN models can effectively capture the intrinsic structure information from hypergraphs and perform stably better prediction with less deviation.
We further study the effect of self-loops for UniGCN and UniGAT. Table 4 reports the mean accuracies for UniGCN and UniGAT when input hypergraphs are with or without self-loops. We observe that when hypergraphs are un-self-looped, the performances drop significantly for most datasets, which support the correctness of the formulations in Section 3.1.
The task for inductive learning on evolving hypergraph takes the historical hypergraph as input and predicts the unseen nodes’ labels.
We closely follow 
and use the corrupted hypergraph which randomly removes 40% vertices as unseen data during training. 20% vertices are used for training and the rest 40% for the seen part of testing vertices. The other experimental settings are similar to those in the transductive semi-supervised learning task. We employ an additional UniGCN variant, denoted as UniGCN*, which applies the linear transform after aggregation and normalization. We compare our models against MLP+HLR and HyperSAGE and use the best results reported from.
Table 2 reports the mean classification accuracy on seen part and unseen part of the testing data. We observe that our UniGNN models consistently show better scores across the benchmark datasets. Similar to the semi-supervised setting, our models notably show significant improvements in dataset DBLP, where the prediction accuracy increases from 78.1% to 89.6% in the seen data and from 73.2% to 83.4% in the unseen data.
Results from table 2 confirm that UniGNNs can capture the global structure information and perform well for predicting unseen nodes in the inductive learning task.
To verify the effectiveness of UniGCNII, we study how the performance changes for vanilla UniGNN models with various depths. In this experiment, we use the same setting as described in the semi-supervised hypernode classification task, except that additional 20% of the original testing split is used as the validation split.
For UniGCNII, we perform all experiments in 1000 epochs and early stopping with a patience of 150 epochs. We use the Adam Optimizer with a learning rate of 0.01. We set the L2 regularizer factor to 0.01 for the convolutional layers, 0.0005 for the dense layer, which is the same as described in GCNII . Please refer to the supplemental materials for more details.
Table 3 summarizes the results, in which the best performed model for each dataset is bolded. We see that UniGCNII enjoys the benefit of deep network structures and shows generally better results as layers increase. We highlight that UniGCNII outperforms the best shallow models in dataset Cora, Pubmed and Citeseer, and obtains competitive results in dataset DBLP. On the contrary, the performance of vanilla models drop significantly as depths increase.
Overall, the results suggest that our proposed framework is capable of incorporating meticulously-designed deep GNN models for deep hypergraph learning.
We propose the UniGNN, a unified framework for graph and hypergraph neural networks. Under this framework, we naturally generalize several classic GNNs to HyperGNNs, which consistently show stably better performances than recent state-of-the-art methods. We firstly solve the over-smoothing problem of deep hypergraph neural networks by presenting the UniGCNII. Our models learn expressive representation of hypergraphs, which can be beneficial for a broad range of downstream tasks. Future works include designing provably more powerful UniGNNs with high-order GWL test. Another interesting direction for future work is to design hypergraph subtree kernel for hypergraph classification.
International Conference on Machine Learning. External Links: Cited by: §1, §1, §3.2, §3.2, §5.3.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3558–3565. Cited by: §1, §2.3, §5.1.
Semi-dynamic hypergraph neural network for 3d pose estimation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, C. Bessiere (Ed.), Note: Main track External Links: Cited by: §1.
We use to denote the labels for in iteration and use to denote the incident hyperedges of vertex in .
We prove this proposition by contrapositive. Suppose , then there exist a bijection and a bijection such that .
Since implies , in iteration 0, we have
Assume that in iteration , holds. Then in iteration , we have
Note that is bijective; thus, we have .
By induction rules, this proves that for any iteration , holds and thereby the proposition. ∎
We use to denote the feature of vertex in iteration and use to denote the multiset of all vertices’ features in hypergraph .
Let denote the set . If we have
then , we have
In iteration , we do not need the second equation in (19). The lemma holds since 1-GWL test and UniGNN start with the same vertex labels/features.
Assume that and , holds. If in iteration , we also have , that is,
Based on our assumption, we must have
Note that UniGNN witnesses the same neighborhood for vertex and vertex ; thus, two-stage aggregating functions generate the same output for them, i.e. . Therefore, the lemma holds. ∎
Suppose that there exists a minimal such that after iterations, the UniGNN has , i.e. , but the 1-GWL test can not decide and as non-isomorphic, i.e. .
If , then implies since 1-GWL test and UniGNN start with the same vertex labels/features. This is contradictive to our assumption.
If , we have and for . By assumption , then the conditions for the lemma hold. Thus, we have
which indicates there exists a valid mapping such that . Since , then
Therefore, we have
This results in the contradiction against our assumption and thereby we prove the original proposition.
We start by proving that there exists an injective mapping such that .
If , it’s trivial to prove that exists since and are both initialized with the same labels/features.
Assume that and the injective mapping exists, then for iteration ,
Since the composition of injective functions is still injective, we can rewrite it as
where injective function is induced from and injective function is induced by .
Therefore, by induction rule, there always exists an injective mapping such that .
Thus, if in iteration , , then
Therefore, by injectivity, we have
Since the global READOUT function is also injective, this implies that if 1-GWL test decides and as non-isomorphic, UniGNN can also distinguish them by . ∎
We can remap the local substructure from incident graph to a subhypergraph and repeat the proof on the subhypergraph as described above. Since we don’t need a global READOUT, the Local Level conditions are sufficient. ∎
Table 5 summarizes the model hyper-parameters and optimizers for all experiments. represents the weight decay for shallow-UniGNNs. and denote the weight decay for dense layer and convolutional layer of UniGCNII respectively. For DBLP dataset, we remove the normalization layer for each model.
|UniGAT||hidden: 8, heads: 8, dropout: 0.6,|
|lr: 0.01, : 5e-4 , attentional dropout: 0.6|
|UniGCN||hidden: 64, dropout: 0.6,|
|lr: 0.01, : 5e-4|
|UniGIN||hidden: 64, dropout: 0.6,|
|lr: 0.01, : 5e-4|
|UniSAGE||hidden: 64, dropout: 0.6,|
|lr: 0.01, : 5e-4, AGGREGATOR:SUM|
|UniGCNII||hidden: 64, dropout: 0.2,|
|lr: 0.01, : 0.01, : 5e-4|
We train each model in each dataset to 200 epochs and report the performance of the model of the last epoch. The experiment of each model for each dataset is conducted 80 times with 10 train/test splits and 8 different random seeds.
We train each model in each dataset with 200 epochs and report the performance of the model of the last epoch. For PubMed dataset, we set the input dropout as 0 instead of 0.6 and run 300 epochs. The experiment of each model for each dataset is conducted 80 times with 10 train/test splits and 8 different random seeds.
In this experiment, additional 20% of the original testing split is used as the validation split and the rest 80% is used as the new testing split. For the shallow UniGNNs, we train each model in each dataset in 200 epochs and report the performance of the model with the highest validation score. For UniGCNII, we perform all experiments in 1000 epochs and early stopping with a patience of 150 epochs. The experiment of each model for each dataset is conducted 80 times with 10 train/val/test splits and 8 different random seeds.
|DBLP||43413||22535||4.7 ± 6.1||1425||6||4.0%|
|PubMed||19717||7963||4.3 ± 5.7||500||3||0.8%|
|Citeseer||3312||1079||3.2 ± 2.0||3703||6||4.2%|
|Cora 1||2708||1072||4.2 ± 4.1||1433||7||5.2%|
|Cora 2||2708||1579||3.0 ± 1.1||1433||7||5.2%|
Statistics of the datasets are summarized in Table 6.