As data becomes more plentiful, we find ourselves with access to increasingly complex networks that can be used to express many different types of information. Hypergraphs have been used to recommend music Bu et al. (2010), model cellular and protein-protein interaction networks Klamt et al. (2009); Lugo-Martinez and Radivojac (2017)
, classify images and perform 3D object recognitionYu et al. (2012); Gao et al. (2012), diagnose Alzheimer’s disease Liu et al. (2017), analyze social relationships Yang et al. (2019); Feng et al. (2019), and classify gene expression Tian et al. (2009)
—the list goes on. However, despite the clear utility of hypergraphs as expressive objects, the field of research studying the application of deep learning to hypergraphs is still emerging. Certainly, there is a much larger and more mature body of research studying the application of deep learning to graphs, and the first notions of graph neural networks were introduced merely just over a decade ago by Gori et al. inGori et al. (2005) and Scarselli et al. in Scarselli et al. (2008). But despite its nascency, the field of deep learning on hypergraphs has seen a rise in attention from researchers recently, which we investigate in Section 2.
In 2017, Zaheer et al. Zaheer et al. (2017) proposed a novel architecture, DeepSets
, for performing deep machine learning tasks on permutation-invariant and equivariant objects. This work was extended and generalized by Hartford et al.Hartford et al. (2018) to describe and study interactions between multiple sets.
In this work, we go a step further, proposing a novel framework that utilizes ideas from context-based graph embedding approaches and permutation-invariant learning to perform transductive and inductive inference on hypergraphs—sets of sets with underlying contextual graph structure. This framework can be used for classification and regression of vertices and hyperedges alike; this work focuses on the classification of hyperedges.
2 Related Work
In 2013, Mikolov et al. proposed an unsupervised learning procedure,skip-gram, in Mikolov et al. (2013) which uses negative sampling to create context-based embeddings for terms given sentences in a corpus of text. This procedure was used by Perozzi et al. in DeepWalk Perozzi et al. (2014)
to treat random walks on a graph as “sentences" and vertices as “terms", which outperformed existing spectral clustering and weighted vote-based relational neighbor classifiersTang and Liu (2011); Huang et al. (2017); Tang and Liu (2009b, a); Macskassy and Provost (2003). Similar random walk-based approaches followed, such as random walks with bias parameters Grover and Leskovec (2016), methods that utilize network attributes as additional features Yang et al. (2016); Gao and Huang (2018); Yang et al. (2015) and approaches that could be extended to perform inductive learning Yang et al. (2016); Hamilton et al. (2017). Graph convolutions were formally defined by Bruna et al. Bruna et al. (2013) and elaborated upon by Kipf and Welling in Kipf and Welling (2016a), who proposed graph convolutional networks
. Graph convolutions have also been used in variational graph autoencoding inKipf and Welling (2016b). Each of these approaches are limited to the graph domain, but are relevant to the proposed framework.
in the permutation invariant and permutation equivariant cases. Particularly, these architectures employ a linear transformation (adding up the representations) was between layers of nonlinear transformations to learn continuous representations of permutation invariant objects, which we utilize in this work. Multiset learning has been studied inHartford et al. (2018) but these methods do not directly employ the context-based inference that can be drawn from the hypergraph structure. Further, methods that make use of membership exclusively, such as PointNet and PointNet++ Qi et al. (2016, 2017), and those that make use of the hypergraph structure, such as HGNNs Feng et al. (2019), have been compared, and in the hypergraph context, it is typically better to include contextual information at inference time.
Hypergraph learning is lesser-studied, but a variety of approaches have nonetheless been proposed. In 2007, Zhou et al. proposed methods for hypergraph clustering and embedding in Zhou et al. (2007), but these methods incur high computational and space complexity. Random walks on hypergraphs have been established, and have likewise been demonstrated as useful in inference tasks Chitra and Raphael (2019); Satchidanand et al. (2015); Sharma et al. (2018), but these methods do not directly account for the set membership and contextual properties of hyperedges simultaneously and efficiently. Very recently, hypergraph convolution and attention approaches have been proposed Feng et al. (2019); Bai et al. (2019) which define a hypergraph Laplacian matrix and perform convolutions on this matrix. Our framework specifically uses a random-walk based model for their own benefits such as parallelizability, scalability, accomodation of inductive learning, and baseline comparisons between random walk models in the graph domain and our own random walk procedure, but convolution approaches could conceivably be integrated into this framework and this line of exploration is left to future work.
3 Deep Hyperedges
We propose Deep Hyperedges (DHE), a framework that utilizes context and set membership to perform transductive and inductive learning on hypergraphs. We will be focusing on classification of hyperedges, but recall that the hyperedges of the dual of a hypergraph correspond to the edges of , so vertex classification may be approached similarly. This is described more in Appendix A. While DHE is conducive to random walk, spectral, and convolutional approaches for the vertex and hyperedge embedding steps, we investigate a random walk approach for the sake of parallelizability and scalability Perozzi et al. (2014) and inductive inference Hamilton et al. (2017).
3.1 Random Walks and Embeddings
To test this framework using random walk models, we proposed a simple new random walk model for hypergraphs, Subsample and Traverse (SaT) Walks, which seeks to capture co-member information in each hyperedge. In this procedure, we start at a vertex in a selected hyperedge
. We define the probability of traversing to be inversely proportional to the cardinality of the current hyperedge; that is,, where are tunable parameters. The expectation of samples the walk will draw from a given hyperedge is geometric, and thus influenced by the cardinality of the hyperedge itself. Using SaT Walks, we can construct random walks of vertices in the hypergraph for embedding in the next stage. Likewise, we may perform SaT Walks on to embed the hyperedges of . For ease of notation, we refer to these walks as Traverse and Select (TaS) Walks. Using TaS Walks, we can construct random walks of hyperedges in the hypergraph for contextual embedding in the next stage. One could also easily define in-out and return parameters for TaS Walks (à la node2vec Grover and Leskovec (2016)) for controlling topological proximity vs. structural similarity (by inclining the search strategy toward BFS or DFS) representations in the embeddings of the vertices and hyperedges.
3.2 Contextual and Permutation-Invariant Representation Learning
At this stage, we employ two distinct networks to address two objectives: 1) we would like to learn a representation of each hyperedge that captures its contextual information, and 2) we’d also like to create a representation of each hyperedge that captures the membership of its vertices in a manner that is invariant to permutation. To address the first objective, we construct a context network (-network) that applies hidden layers h to the hyperedge embedding to output a learned contextual representation . The second objective requires a linear transformation at a hidden layer for permutation invariance. We use the and networks of the DeepSets architecture. The membership representation network (-network) takes as input for each , apply a nonlinear transformation to these inputs individually (forming our network), add up the representations to obtain , and finally apply hidden layers to this representation (forming our network) to obtain a membership representation of the hyperedge that is invariant to permutation. We then concatenate the representations output by the -network and -network and apply
hidden layers and a final softmax layer:, where denotes concatenation. Optionally, we apply
hidden layers to the input feature vectorto obtain , and our formulation would then be
This is the formulation shown in Figure 1. One could also include a permutation-invariant representations of the features of the vertices of a hyperedge for inclusion using DeepSets, as well. For the inductive formulation, we want to learn a function that can generalize to unseen vertices. Hence, this function will only depend initially on an input feature vector and a uniform sampling and pooling aggregation procedure that arises naturally with SaT Walks and TaS Walks (à la GraphSAGE Hamilton et al. (2017)). The forward propagation algorithm for embedding generation is described in this work, and with these embeddings, we proceed as before.
To evaluate DHE on benchmark citation network datasets that have been configured as hypergraphs where is paper and its 1-neighborhood (described in Appendix C), we compare to several state-of-the-art approaches (in order): DeepWalk (DW) by Perozzi et al., 2014 Perozzi et al. (2014), node2vec (N2V) by Grover et al., 2016 Grover and Leskovec (2016), GraRep by Cao et al., 2015 Cao et al. (2015), and LINE by Tang et al., 2015 Tang et al. (2015), which do not use network attributes; and TADW by Yang et al., 2015 Yang et al. (2015), ANE by Huang et al., Huang et al. (2017), Graph Auto-Encoder (GAE) by Kipf and Welling, 2016 Kipf and Welling (2016b), Variational Graph Auto-Encoder (VGAE) by Kipf and Welling, 2016 Kipf and Welling (2016b), GraphSAGE (SAGE) by Hamilton et al., 2017 Hamilton et al. (2017), and DANE by Gao and Huang, 2018 Gao and Huang (2018), which do use network attributes, as our model does. We benchmark scores for both datasets with randomly sampled train:test splits of 10:90, 30:70, and 50:50, as reported by Gao and Huang (2018) (none is used for validation). We also explore model performance on several nonstandard hypergraph datasets in Appendix D.
We use 25 random walks of length 25 for each hyperedge and vertex and embed each hyperedge and vertex into and
, respectively, using skip-gram. Each hidden layer in each of our networks has 100 neurons, with the exception of of the output of the-network, which has 30 neurons. We use dropout Srivastava et al. (2014) network layers (, as suggested in Zaheer et al. (2017)). Finally, we use SGD as our optimization algorithm and categorical cross-entropy loss. The scores reported are the average of 5 runs for each test. Our model scales well and has been tested on hyperedges of cardinalities of up to
while maintaining low epoch runtime (less thanseconds per epoch on a CPU). DHE outperforms the other methods in most cases, with the additional, unique advantage that it can be extended to hypergraphs.
In this work, we’ve proposed Deep Hyperedges (DHE), a framework that jointly uses contextual and permutation-invariant vertex membership properties of hyperedges in hypergraphs to perform classification and regression in transductive and inductive learning settings. We’ve also proposed a random walk model that can be used for obtaining embeddings of hyperedges and vertices for evaluation. With these, demonstrated that DHE achieves and oftentimes surpasses state-of-the-art performance on benchmark graph datasets and explored nonstandard hypergraph datasets in Appendix D). We’ve also identified several exciting avenues of future work, including deeper exploration into the inductive learning case and usage of convolutional models in the embedding step. One aim of this framework is to provide flexibility in integrating of other approaches, and in doing so, encourage collaborative efforts towards cracking the hypergraph learning problem.
- Hypergraph convolution and hypergraph attention. arXiv preprint arXiv:1901.08150. Cited by: §2.
- Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.
- Music recommendation by unified hypergraph: combining social media information and music content. In Proceedings of the 18th ACM international conference on Multimedia, pp. 391–400. Cited by: §1.
- Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pp. 891–900. Cited by: §4.
- Random walks on hypergraphs with edge-dependent vertex weights. CoRR abs/1905.08287. External Links: Cited by: §2.
Hypergraph neural networks.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3558–3565. Cited by: §1, §2, §2.
- Deep attributed network embedding.. In IJCAI, Vol. 18, pp. 3364–3370. Cited by: §2, §4.
- 3-d object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21 (9), pp. 4290–4303. Cited by: §1.
- CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic acids research 47 (D1), pp. D559–D563. Cited by: Appendix A.
- A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §1.
- Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §2, §3.1, §4.
- Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2, §3.2, §3, §4.
- Deep models of interactions across sets. arXiv preprint arXiv:1803.02879. Cited by: §1, §2.
- Label informed attributed network embedding. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 731–739. Cited by: §2, §4.
- Modeling hypergraphs by graphs with the same mincut properties. Information Processing Letters 45 (4), pp. 171–175. Cited by: Appendix A.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
- Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §2, §4.
- Hypergraphs and cellular networks. PLoS computational biology 5 (5), pp. e1000385. Cited by: §1.
- View-aligned hypergraph learning for alzheimer’s disease diagnosis with incomplete multi-modality data. Medical image analysis 36, pp. 123–134. Cited by: §1.
- Classification in biological networks with hypergraphlet kernels. arXiv preprint arXiv:1703.04823. Cited by: §1.
- Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Figure 2.
- A simple relational classifier. Technical report NEW YORK UNIV NY STERN SCHOOL OF BUSINESS. Cited by: §2.
- Distributed representations of words and phrases and their compositionality. External Links: Cited by: §2.
- DeepWalk. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14. External Links: Cited by: §2, §3, §4.
- PointNet: deep learning on point sets for 3d classification and segmentation. CoRR abs/1612.00593. Cited by: §2.
- Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §2.
- Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500. Cited by: §2.
- Extended discriminative random walk: a hypergraph approach to multi-view multi-relational transductive learning. In Twenty-Fourth International Joint Conference on Artificial Intelligence, Cited by: §2.
- The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §1.
- Hyperedge2vec: distributed representations for hyperedges. Cited by: §2.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15 (1), pp. 1929–1958. Cited by: §4.
- LINE. Proceedings of the 24th International Conference on World Wide Web - WWW ’15. External Links: Cited by: §4.
- Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 817–826. Cited by: §2.
- Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1107–1116. Cited by: §2.
- Leveraging social media networks for classification. Data Mining and Knowledge Discovery 23 (3), pp. 447–478. Cited by: §2.
- A hypergraph-based learning algorithm for classifying gene expression and arraycgh data with prior knowledge. Bioinformatics 25 (21), pp. 2831–2838. Cited by: §1.
- Structural deep embedding for hyper-networks. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: Appendix A.
- Network representation learning with rich text information. In Twenty-Fourth International Joint Conference on Artificial Intelligence, Cited by: §2, §4.
- Revisiting user mobility and social relationships in lsbns: a hypergraph embedding approach. In The World Wide Web Conference, pp. 2147–2157. Cited by: §1.
Revisiting semi-supervised learning with graph embeddings. External Links: Cited by: §2.
- Adaptive hypergraph learning and its application in image classification. IEEE Transactions on Image Processing 21 (7), pp. 3262–3272. Cited by: §1.
- Deep sets. In Advances in neural information processing systems, pp. 3391–3401. Cited by: §1, §2, §4.
- Learning with hypergraphs: clustering, classification, and embedding. In Advances in neural information processing systems, pp. 1601–1608. Cited by: §2.
Appendix A Supplementary: Background on Graphs and Hypergraphs
A hypergraph = is comprised of a finite set of vertices and a set of hyperedges . We consider connected hypergraphs with . The dual, , of a hypergraph is a hypergraph such that the hyperedges and vertices of are given by the vertices and hyperedges of , respectively. That is, a vertex of is a member of a hyperedge of if and only if its corresponding hyperedge of contained the vertex of corresponding to . Note that . A graph is a hypergraph where for each . Graphs are well-studied in the field of machine learning, but are not capable completely representing the information captured in hypergraphs generally. Ihler et al. showed in Ihler et al. (1993), for instance, that if , there does not exist a representation of a hypergraph by a graph with the same cut properties in the general case. This fact has practical implications as well: for instance, one cannot represent the protein complexes in the CORUM dataset Giurgiu et al. (2018) using pairwise relationships, as the proteins in a complex may not interact with each other independently. These hyperedges are said to be indecomposable Tu et al. (2018). Hence, while the theory built for studying graphs can be utilized to some capacity in the hypergraph context, there is a need for learning procedures that can effectively generalize to hypergraphs.
Appendix B Supplementary: Background on Transductive and Inductive Learning
In transductive (or semi-supervised) inference tasks, one often seeks to learn from a small amount of labeled training data, where the model has access to labeled and unlabeled data at training time. Formally, we have training instances where are labeled instances and unlabeled instances, and corresponding labels in . Our aim is to learn a function . Typically, in the case of transductive learning on graphs and hypergraphs, we seek to leverage topological information to represent the vertices in some continuous vector space by embeddings which capture the vertices’ or hyperedges’ context (homophily and structural equivalence). As such, in the pre-training procedure for finding vertex embeddings, we want to find an embedding function that maximizes the likelihood of observing a vertex in the sampled neighborhood of given :
The procedure for finding contextual embeddings for hyperedges is similar. Once we’ve learned , we can use the embeddings to learn in our transductive learning procedure.
In inductive (or supervised) inference tasks, we are given a training sample to be seen by our model, and we want to learn a function
that can generalize to unseen instances. This type of learning is particularly useful for dynamic graphs and hypergraphs, when we may find unseen vertices as time passes, or when we want to apply transfer learning to new graphs and hypergraphs altogether. Here, the representation learning functionis typically dependent on the input features for a vertex .
Several approaches have been proposed for each of these learning paradigms. Each has its uses, and each can be applied using this framework. However, for testing purposes, we focus primarily on the transductive case, leaving a quantitative investigation of the inductive case in this framework to a future work.
Appendix C Supplementary: Benchmark Dataset Descriptions
Because far more literature exists on graph deep learning than does on hypergraph deep learning, a typical benchmarking approach is to take a graph dataset, represent the data in a way that creates hyperedges with cardinality greater than two (without modifying the data itself), and compare with graph-based learning benchmarks. For this phase of the evaluation, we used the Cora and PubMed datasets, citation networks in which vertices are papers, edges are citations, and each paper is labeled as being one of 7 or 3 fairly balanced classes, respectively. Each vertex also has a feature vector, which is a publication content representation in and for the Cora and PubMed datasets, respectively. The hypergraph extension involves creating a hyperedge for each paper where the paper is the centroid and its 1-neighborhood forms the hyperedge. As the mapping between papers and hyperedges is bijective, it remains to classify the hyperedge for the inference task.
Appendix D Supplementary: Non-Standard Datasets and Comparisons Between Models
In this appendix, we explore and compare performance results between the three models presented (Deep Hyperedges, DeepSets + SaT Walks, Multilayer Perceptron + TaS Walks), as well as between the seven datasets we investigated. The training:validation:test split used is 80:10:10, and network features are not used—only the hypergraph structure. As mentioned, these results could improve in a future work with the use of graph convolutions in this framework, as well as model tweaking.
From left to right, we can view a t-SNE plot of the hyperedge embeddings constructed using TaS Walks, training accuracy, training loss, validation accuracy, and validation loss. The green plot represents DHE, the blue plot represents DeepSets + SaT Walks, and the red plot represents MLP + TaS Walks. This is meant to provide an overview at-a-glance of the relative effectiveness and trends of each model for each dataset, and across datasets. We observe that DHE typically performs the best in validation and test accuracy, and that the relative level of performance between DeepSets + SaT Walks and MLP + TaS Walks depends on how important cardinality, vertex membership, and context are in the data.
These tests are all available as Jupyter Notebooks for ease of access and convenience of testing on other hypergraph data.111Code: https://github.com/Josh-Payne/deep-hyperedges
The Cora dataset222https://relational.fit.cvut.cz/dataset/CORA is a computer science publication citation network dataset where vertices are papers and hyperedges are the 1-neighborhood of a centroid paper. Each paper (and thus each hyperedge) is classified into one of seven classes based on topic. DHE achieved around 82.29% test accuracy.
The PubMed diabetes publication citation network dataset333https://linqs.soe.ucsc.edu/data is a network where vertices are papers and hyperedges are the cited works of a given paper. Each paper is classified into one of three classes based on the type of diabetes it studies. DHE achieved around 82.35% test accuracy.
The CORUM protein complex dataset444https://mips.helmholtz-muenchen.de/corum/ represents a network where vertices are proteins and hyperedges are collections of proteins that interact to form a protein complex. Each hyperedge is labeled based on whether or not the collection forms a protein complex. We generate negative examples by selecting a proteins at random to form a negative hyperedge, where is selected uniformly between the minimum hyperedge cardinality and the maximum hyperedge cardinality. DHE achieves 95.53% test accuracy, with DeepSets + SaT Walks close behind, as a critical signal factor here is cardinality.
CORUM dataset, same distribution. We generate negative examples by selecting a proteins at random to form a negative hyperedge, where is selected from the distribution in the data. This is a trickier dataset, and DHE achieves 70.23% test accuracy. The key insight brought by differences in cardinality enjoyed by DeepSets + SaT Walks in the previous dataset are no more, as it trails behind MLP + TaS Walks. Perhaps if we’d let it train for more epochs, it would’ve caught up, as DeepSets + SaT Walks captures context, as well—just not as readily.
The Meetups dataset555https://www.kaggle.com/sirpunch/meetups-data-from-meetupcom is a highly unbalanced social networking dataset where vertices are members and hyperedges are meetup events. Each meetup event is classified into one of 33 types. The model performed fairly decently, achieving around 60.10% accuracy.
Meetups dataset, balanced. We grouped “Tech" and “Career & Business" meetups together into one class and every other type of meetup into another class to give the dataset more balance. This time, DHE performed much better, achieving around 88.31% test accuracy. DeepSets + SaT Walks performed even better, with 91.31% test accuracy.
The DisGeNet666http://www.disgenet.org/downloads dataset is a fairly unbalanced dataset with 23 classes. It is a disease genomics dataset where vertices are genes and hyperedges are diseases. Each disease is classified into one of 23 MeSH codes (if it has multiple, we randomly select one). The model achieved around 40% test accuracy—more than we expected, given that additional features are not being trained on, and the linear inseparability of the hyperedge embeddings.