1. Introduction
Graphs cover a wide range of applications, but there are domains in which an ordinary graph would fail to capture the relations of entities. Consider a research community, where authors publish papers in groups of more than two. It would involve information loss to represent such groups of collaborators as just pairwise edges as in an ordinary graph. Such interactions are effectively captured by hyperedges, an extended notion of edges that join an arbitrary number of entities. Graphs with hyperedges, referred to as hypergraphs, are everywhere in our offline/online networks. People gather in groups (sinha2015MAG), biological phenomena are caused by joint protein interactions (navlakha2010power), and web posts contain tags (zhang2019language; ofli2017saki).
One of the critical issues in playing with hypergraphs is how to process, simplify, and represent higherorder interactions for a given task. One may make a highly abstract representation of complex multiway interactions, e.g., (zhang2018beyond; yadati2018link; benson2018simplicial; xu2013hyperlink; li2013link), while others may use the original hypergraph as it is, e.g., (sharma2014predicting; arya2018exploiting; huang2015scalable; yadati2018hypergcn; feng2019hypergraph; benson2018sequences). Despite the recent advances in processing units and memory devices for highperformance data processing, it is still daunting and computationally intractable to maintain whole group interactions in largescale hypergraphs and use them for solving a given task.
We are motivated by such a reality and ask the following question: How much abstraction of group interactions is sufficient in solving a given graph task, and how different such results become across datasets that vary in scale, entities, and pattern of interactions? The answers to this question would give us useful engineering guidelines on how to appropriately trade off between complexity in representation of higherorder group interactions and accuracy of solving the task. In seeking to answer this question, we may find a new method that outperforms existing algorithms in literature while maintaining computational tractability. In this paper, we consider the hyperedge prediction task, which is a hypergraph extension of link prediction. Link prediction is a widely accepted means of assessing the validity of graph models (liben2007link; lu2011link; grover2016node2vec; zhou2017scalable; santolini2018predicting; you2019position; grover2019graphite).
As an important device to answer our question by solving the hyperedge prediction task, we introduce the projected graph, , for a given hypergraph . This is a modified version of the original hypergraph so as to contain way group interactions. By incrementally stacking projected graphs, we can represent the original hypergraph with up to way interactions. As expected, as grows, we reduce the information loss, but the computational cost for processing increases. The notion of projected graphs is not entirely new as adopted in (zhang2018beyond; yadati2018link; benson2018simplicial; sharma2014predicting). However, it has been limited to the pairwise relation, which turns out to be the projected graph, a special case of the projected graph. We generalize this pairwise relation to way interactions in to quantify and decompose the degree of interactions, constructed as follows: Each edge in is weighted by the number of hyperedges in which the node set of size have appeared together (see Section 4 for details).
The value of the projected graph is clear by the following example: Suppose that we want to predict whether four people would collaborate or not in the future. It is useful to know how much each pair has collaborated together as in projected graph. However, collaboration is often formed because a group of three people, who have collaborated as a group, may recruit a fourth person in the future, where way interaction becomes valuable.
We conduct experiments using 15 datasets spanning 8 domains provided by (benson2018simplicial; stehle2011high; mastrandrea2015contact; yin2017local; leskovec2007evolution; fowler2006connecting; fowler2006cosponsorship; sinha2015MAG)
. These datasets are highly heterogeneous in terms of scale, pattern of interactions, and interacting entities, ranging from about 1,000 to 2,500,000 hyperedges. We use logistic regression for prediction, where we utilize the features popularly used in link/hyperedge prediction tasks but generalized for
projected graphs. The prediction results would change for different methods, but we experience similar trends. We summarize the key findings of our experiments in what follows:
Diminishing returns. We systematically analyze the gain of approximating a hypergraph with increasing orders of . Particularly, we find that small orders of are enough to achieve comparable accuracy with near perfect approximations.

Troubleshooter. As we explore the outcomes in possible variations of the task, we discover that higherorder helps more in more challenging variations.

Irreducibility. We search for theoretical interpretations as to why the benefit of higher is greater in some datasets. These are datasets whose higherorder relations share little information with pairwise relations, thus cannot be reduced to pairwise.
Our source code and appendix are available online at (appendix).
2. Related work
Hypergraphs have been used in various domains, including social networks (tan2014mapping; yang2019revisiting), text retrieval (hu2008hypergraph), recommendation (bu2010music; zhu2016heterogeneous)
(fatemi2019knowledge), bioinformatics (klamt2009hypergraphs; hwang2008learning), ecommerce (li2018tail)(huang2015learning; chen2009efficient) and circuit design (ouyang2002multilevel; karypis1999multilevel). Learning tasks based on hypergraphs include clustering (zhou2007learning; agarwal2005beyond; karypis2000multilevel; huang2015scalable), classification (yadati2018hypergcn; feng2019hypergraph), and hyperedge prediction (zhang2018beyond; yadati2018link; benson2018simplicial; xu2013hyperlink; li2013link; sharma2014predicting; arya2018exploiting; benson2018sequences).Hypergraph representation.
To represent hypergraphs in an abstract manner, one method is to perform dyadic projection, also known as the clique expansion, reflecting twoway node relationships. This leads to usage of powerful tools such as spectral clustering
(zhou2007learning). Clique averaging is a similar method (agarwal2005beyond) which assigns edge weights differently. karypis2000multilevel create successively coarser versions of a hypergraph for partitioning. The category of using hypergraphs without modification includes star expansion (agarwal2006higher) that connects each node in a hyperedge to a new node that represents a hyperedge. There are works that directly use hypergraphs with the idea of two resilient distributed datasets (RDDs) (huang2015scalable)and deep learning approaches
(yadati2018hypergcn; feng2019hypergraph).Representation in hyperedge prediction. We now focus on prior works on hyperedge prediction. There are works that handle hypergraphs just with pairwise relations. zhang2018beyond project a hypergraph into a dyadic graph and uses its adjacency matrix for factorization. yadati2018link propose a deep learning approach with a 2projected graph as the input. benson2018simplicial compare the performances of various features from the 2projected graph to predict the cooccurrence of node triples. xu2013hyperlink learn representations for the distance matrix constructed from dyadic hops. li2013link rank hyperedges according to the proximity between two users. Another array of research apply hypergraphs as they are, implying the importance of using higherorder interactions. sharma2014predicting claim that 2projected graphs fail to capture higherorder relationships. arya2018exploiting represent the whole hypergraph as the matrix of a starexpanded graph (agarwal2006higher) and formulate hyperedge prediction as a matrix completion problem. benson2018sequences operate on the sequence of sets, a timestamped representation of hyperedges, to generate the next timestamp hyperedge.
In this paper, we propose a parameterized representation framework that generates the entire spectrum of projected graphs and study the impact of the degree of simplification. An additional benefit of way decomposition as in our projected graphs is that each degree allows a certain form of uniformity, which enables us to enjoy computational amenity and mathematical tractability at each . Such benefits are verified in other contexts by (kolda2009tensor; shashua2006multi; bulo2009game; ghoshdastidar2017uniform; lin2009metafac).
3. Problem formulation
In this section, we formulate the problem of hyperedge prediction (Sections 3.1 and 3.2), which serves as a tool to evaluate the accuracy of hypergraph abstractions, and introduce possible variations on the problem (Section 3.3).
3.1. Concepts: Hypergraphs
Let be a hypergraph where is a set of nodes and is a set of hyperedges. Each hyperedge represents a set of nodes that took interaction. We weight each hyperedge by the number of times occurrence, and each hyperedge has a positive weight .
3.2. Problem: Hyperedge prediction
The problem of hyperedge prediction is generally defined as: Given an hypergraph in which hyperedges have timestamps up to , predict the hyperedges that will appear from until a time point in the future. However, a common practice is to remove some hyperedges from a snapshot of a hypergraph and regard them as the ones in the future (grover2016node2vec; yadati2018link), since timestamps are unavailable in many realworld data.^{1}^{1}1Though the datasets in this paper are originally timestamped, we follow this practice. Furthermore, it is unnecessary to generate all the missing hyperedges from , since the extreme sparsity would lead to poor generalization (zhang2018beyond). Thus, we solve a standard binary classification problem (Problem 1), where we use to indicate the set of hyperedges remaining after some are removed from :
Problem 1 (Hyperedge prediction).

Given:

a hypergraph

a candidate hyperedge set where


Decide: whether each subset belongs to where .
We divide into a set of positive hyperedges in and a set of negative hyperedges not in . That is, , while
. Then, the objective is to find a classifier
that is close to the perfect classifier , where for and for .3.3. Constructing hyperedge candidate set
There are different ways of constructing the candidate set . We thoroughly examine different choices of since experiments on a single choice could be biased for that particular case.
Hyperedge size. We consider three cases where each candidate has cardinality , , and , respectively. For each size, we systematically analyze the effect of higherorder interactions.
Negative hyperedges. While positive hyperedges can be collected simply by removing a certain proportion of , negative hyperedges need to be generated from If the nodes in each negative hyperedge are independently sampled, the resulting hyperedge will be unlikely to occur (e.g., total strangers are very unlikely to collaborate), making classification trivial. To avoid this situation, we select nodes that form stars or cliques in the pairwise projected graph as negative hyperedges. From the pairwise perspective, nodes that form a clique are more strongly tied and thus more likely to from a hyperedge than those which form a star. Thus, the task becomes more challenging when is generated from cliques.
Class imbalance. Now that we have considered the quality of negative hyperedges, we turn our attention to their quantity: how large should be? Since only a few form hyperedges among all possible node sets, it is natural to make , imposing class imbalance. We set the class ratio to be 1:1, 1:2, 1:5, 1:10, and for some cases, 1:200. Larger imbalance adds more difficulty to finding all while being precise as not to falsely predict .
4. Methods
In this section, we formally define the projected graph and the order expansion (Section 4.1), and we describe our prediction methods based upon the order expansion (Section 4.2)
4.1. The norder expansion
We propose a method of incrementally representing highorder interactions in a given hypergraph, namely the order expansion. Each increment in the representation is given as the projected graph (or pg in short), which captures the interactions of nodes. We note that there could be other ways of extracting uniformsize interactions, but we choose the pg since its graphical representation enables the adoption of various principled link prediction features that are widely acknowledged in literature (adamic2003friends; benson2018simplicial; liben2007link; grover2016node2vec). Furthermore, it is a generalization of the commonlyused pairwise projected graph, providing conceptual consistency.
Definition 4.1 (projected graph).
The projected graph of a hypergraph is defined as follows:
That is, each node in the projected graph of a hypergraph is a size() subset of nodes in , and each edge represents a size subset of nodes in contained in at least one hyperedge in . The weight of an edge corresponds to the sum of weights of hyperedges in that contain the size subset represented by the edge. In other words, the weight of each edge indicates how often the corresponding nodes interact as a group and thus how close they are as a group. Notice that the pairwise projected graph is a special case of the projected graph where . Figure 1 gives a visual description.
Based on projected graphs, we define the order expansion, our proposed way of incrementally approximating a hypergraph.
Definition 4.2 (order expansion).
The order expansion of a hypergraph is a collection of projected graphs where varies from to . That is,
As increases, the order representation captures more information in , and if reaches its maximum, can be reconstructed from . In Section 5, we experimentally study the value of marginal information gain (quantified by prediction accuracy) for each in the order expansion in hyperedge prediction.
4.2. Prediction model
In this subsection, we describe the features and classifier that we use for hyperedge prediction.
Features. The order expansion of a hypergraph returns a series of projected graphs, from each of which we extract one among six features. Let be the set of neighbors of the node in the projected graph , and for each subset of nodes in , let be the set of “inner” edges in that represent a subset of
. Then, we use the following features extractable from
for each hyperedge candidate :The first three measures (GM, HM, and AM) are the geometric, harmonic, and arithmetic means of inner edge weights in the projected graphs. These features are reported to work well in the task of predicting triangles in projected graphs (benson2018simplicial). For the other three measures (CN, JC, and AA), we extend wellknown pairwise link prediction features (newman2001clustering; salton1983introduction; adamic2003friends) to larger groups of nodes.
When the input hypergraph is represented in the form of the order expansion , the features obtained in different projected graphs are concatenated. That is, the features of a subset of nodes obtained from are .
Classifier. We use the above features as the inputs to a logistic regression classifier with L2 regularization, which has been used widely for link and hyperedge prediction (grover2016node2vec; benson2018simplicial; liben2007link)
. Although complicated classifiers with more parameters, such as deep neural networks, could be used instead, their performance has higher variance and depends more heavily on hyperparameter values. We decide to use the simple classifier to provide stable comparisons of different orders of approximation.
5. Experiments
In this section, we present our experimental results to address our questions on the impact of higherorder interactions in the form of projected graphs (or simply pgs throughout this section).
5.1. Setup
We start by explaining our datasets and the experimental setup, followed by our results in each of subsections.
Datasets. We use 15 datasets generated across 8 domains from (benson2018simplicial), available at https://www.cs.cornell.edu/~arb/data/. The numbers of edges and hyperedges in them are summarized in Table 1. Hyperedges in each domain are defined as follows: (a) Email (emailEnron (klimt2004enron), emailEu (yin2017local)): recipient addresses of an email, (b) Contact (contactprimaryschool (stehle2011high), contacthighschool (mastrandrea2015contact)): persons that appeared in facetoface proximity, (c) Drug components (NDCclasses, NDCsubstances): classes or substances within a single drug, listed in the National Drug Code Directory, (d) Drug use (DAWN): drugs used by a patient, reported to the Drug Abuse Warning Network, before an emergency visit, (e) US Congress (congressbills (fowler2006cosponsorship)): congress members cosponsoring a bill, (f) Online tags (tagsaskubuntu, tagsmathsx): tags in a question in Stack Exchange forums, (g) Online threads (tagsaskubuntu, tagsmathsx): users answering a question in Stack Exchange forums, and (h) Coauthorship (coauthMAGHistory (sinha2015MAG), coauthMAGGeology (sinha2015MAG), coauthDBLP): coauthors of a publication. We only consider hyperedges containing at most 10 nodes. It is reported that large hyperedges are rare and less meaningful (benson2018simplicial). As mentioned in Section 3.2, the datasets are timestamped but we treat them as weighted hypergraphs with unique hyperedges.
Training and evaluation. For each target hyperedge size (4, 5, 10), we generate positive hyperedges by randomly removing hyperedges until of all hyperedges or no hyperedges with the target size are left. We randomly sample the sets of nodes that form cliques and stars in 2pg as negative hyperedges until a certain multiple of positive hyperedges ( 1, 2, 5, 10, 200) are gathered (see Section 3.3 for details). The positive and negative hyperedges are combined to form the candidate set, i.e., . The candidate set is split into train and test sets. We evaluate classification performance by the area under the precisionrecall curve (AUCPR) (davis2006relationship), a measure sensitive to class imbalance.
Dataset  

emailEnron  1,491  1,442  8,916  25,938 
emailEu  24,223  21,465  143,238  440,916 
contactprimaryschool  12,704  8,317  15,417  2,286 
contacthighschool  7,818  5,818  7,110  1,428 
NDCclasses  901  3,727  21,885  61,176 
NDCsubstances  8,167  26,973  234,240  729,012 
DAWN  137,417  97,046  1,456,683  4,917,996 
congressbills  57,887  178,647  2,439,960  8,117,514 
tagsaskubuntu  147,222  132,703  838,107  874,056 
tagsmathsx  170,476  91,685  748,644  936,774 
threadsaskubuntu  166,995  186,955  181,881  116,046 
threadsmathsx  595,648  1,083,531  2,184,567  2,174,994 
coauthMAGHistory  891,296  723,382  2,101,608  4,226,058 
coauthMAGGeology  1,189,770  4,241,817  18,870,564  40,067,280 
coauthDBLP  2,454,734  7,123,888  26,398,201  46,071,251 
Size 4 prediction  Size 5 prediction  
(lr)27 (l)819  2 to 3 gain (%)  2 to 3 gain (%)  3 to 4 gain (%)  
(lr)27 (lr)813 (l)1419 Dataset  GM  HM  AM  CN  JC  AA  GM  HM  AM  CN  JC  AA  GM  HM  AM  CN  JC  AA 
emailEnron  12.67  0.49  0.15  33.89  1.57  35.78  14.37  22.59  5.95  10.73  2.98  17.01  1.53  0.59  0.26  5.04  1.59  2.05 
emailEu  0.69  0.13  1.67  153.81  9.16  148.09  2.44  0.78  4.04  158.79  13.93  157.73  0.64  0.32  1.23  2.01  18.86  2.31 
contactprimaryschool  6.42  1.21  49.2  495.94  413.35  484.73  0  0  6.63  708.56  361.79  267.66  0  0  0  0  0  0 
contacthighschool  15.16  0.87  78.62  515.17  455.54  507.13  0  0  14.14  1623.33  221.51  1617.75  0  5.96  3.9  0  104.37  0 
NDCclasses  0.18  4.23  44.70  1.55  10.91  2.25  44.28  16.62  37.82  0.28  10.07  5.60  6.77  16.95  2.31  4.14  2.00  1.50 
NDCsubstances  4.95  0.02  0.47  0.57  40.98  3.54  10.73  2.46  0.16  16.01  17.02  14.18  158.10  0.02  7.07  1.81  0.47  2.99 
DAWN  0.15  0.04  21.34  197.97  30.48  187.62  0.23  3e4  3.48  220.79  42.80  212.33  0.49  4e4  14.04  0.85  17.31  0.54 
congressbills  7.92  0.99  14.53  328.76  16.49  294.16  11.84  0.03  30.86  271.64  48.55  259.22  0.07  4.98  0.26  0.93  0.57  0.16 
tagsaskubuntu  0.24  0.51  23.09  216.47  14.07  192.03  0.07  0.02  20.84  244.72  80.37  225.89  1e05  1.13  2.96  0.85  5.50  1.35 
tagsmathsx  0.46  0.18  32.38  137.4  46.53  127.25  0.13  0.01  21.35  146.02  60.64  135.54  1e05  0.63  9.73  0.67  5.86  0.74 
threadsaskubuntu  2e3  10.05  2.47  2.34  2.34  1.48  1e3  1.76  6.51  2.56  3.10  1.76  1e4  9.62  0.07  0.01  0.05  1e3 
threadsmathsx  0.03  0.44  8.52  6.01  5.61  5.10  5e3  0.42  23.48  6.63  6.56  5.65  1e3  0.61  0.01  2e4  0.15  4e6 
coauthMAGHistory  4e3  8.48  1.81  1.69  3.00  1.94  0.08  0.13  3.00  2.32  4.43  2.37  0.16  137.94  2.36  0.23  0.53  0.10 
coauthMAGGeology  0.93  0.36  22.93  15.39  19.76  15.34  0.79  3.78  603.02  14.56  20.52  14.65  30.08  0.17  8.14  0.10  1.68  0.39 
coauthDBLP  53.43  0.90  145.31  16.89  21.99  16.73  1.32  3.52  175.24  16.17  22.82  15.44  24.05  0.58  11.06  0.08  1.69  0.30 
(r)11 (lr)27 (lr)813 (l)1419 Average  0.90  0.46  23.60  141.59  67.11  134.41  5.43  3.16  58.73  229.54  58.47  220.85  7.36  11.69  3.88  0.15  10.62  0.13 
5.2. Results and messages
In this subsection, we present our results by summarizing them with three main messages.
(M1) More higherorder information leads to better prediction quality, but with diminishing returns.
We investigate how the prediction performance changes with increasing in order expansions. In particular, we predict hyperedges of size 4 with the features from 2 and 3order expansions, and hyperedges of size 5 with features from 2, 3, and 4order expansions. Table 2 summarizes the results, and for readers’ convenience, we also plot the performance averaged across all features in Figure 2.
We clearly observe that higherorder expansion gives better performance, where the improvement quantity differs across datasets and features. Performance gaps from to , averaged across features, are 61% for size 4 and 94% for size 5, respectively, whereas it is just 6% for size 5 from to As for the individual features, we see that the gain is larger with neighborhoodbased features (CN, JC, AA) than with meanbased features (GM, HM, AM). The mean values are small in higherorder pgs, while neighborhoodbased features still retain meaningfully large values. Entries with exactly zero gain (contact datasets) result from the sparsity of size 5 hyperedges (i.e., ). See more details in (appendix).
Interestingly, we see the diminishing returns as increases. To study this, we predict size 10 hyperedges with order expansions for Figure 3 shows that, in most datasets, the performances tend to increase significantly from to , but marginally for . However, somewhat unexpectedly, we also find that some datasets experience a small jump (not as high as that from to ) from to (NDCclasses, coauthMAGHistory) or from to (NDCsubstances, coauthMAGGeology, coauthDBLP). We speculate that it is because knowing 8 or 9way interactions is often more useful compared to knowing those between 4 to 7way, for predicting size 10. For illustration, papers with 10 authors would be often made by a group of 9 existing collaborators’ invitation of another author.
(M2) More hardness of the task gives higher values to higherorder information.
As discussed in Section 3.3, we adjust the hardness level in hyperedge prediction by varying the negative set in terms of negative hyperedge types (stars and cliques) and class imbalances (1:1, 1:2, 1:5, 1:10). We investigate the impact of those variations on the performance gain, summarized in Figure 4. The axis represents the types of negative hyperedges (stars or cliques), extracted from 2pg, and the axis represents different class imbalances.
Regarding the types of negative hyperedges, we see that the gains from to are larger for cliques than stars. As explained in Section 3.3, distinguishing whether a clique is a true hyperedge or not is a much harder task compared to a star. The troubleshooter is 3pg, that is, to refer to higherorder information. On the impact of class imbalance, again the gain from to is larger, as class imbalance grows. More negative hyperedges imply the increasing hardness in obtaining better precision, while maintaining the same sensitivity, and there are more negative hyperedges that resemble positive hyperedges. In such cases, incorporating 3pg in addition to 2pg, provided that it gives more information, helps distinguish fake samples better. Note that in GM, there are reversed tendencies. This is explained by the property of GM (i.e., geometric mean defined in Section 4.2) that a pairwise disconnection makes , i.e., strict stars are easily filtered with only 2pg, since .
(M3) Higherorder information helps more, when (i) higherorder interactions are more frequent, and (ii) higherorder interactions share less information with pairwise ones.
We observe different performance gains across different datasets in the two prior messages. For example, in Table 2, the average gain from to was about 284% in contactprimaryschool, while it was merely 6.5% in NDCclasses. We now delve into why they do, for which we measure various statistics, including the edge density and informationtheoretic values of some pgs.
(a) Edge density in 3pg. We first examine edge density in 3pg, i.e., . This intuitively quantifies the abundance of 3way interactions. We measure the Pearson correlation coefficient between edge density and performance gain, as shown in Figures 5A and D. We observe positive correlations between those two, 0.22 and 0.09 for size 5 and size 4 predictions, respectively, implying that more frequent higherorder interactions let higherorder representation lead to better prediction.
(b) Mutual information and conditional entropy.
We next study the aforementioned observation with informationtheoretic measures. We expect that adding 3pg would have larger returns when it contains more information exclusive to itself. We set two joint random variables
and generated by three different nodes sampleduniformly at random, representing a vector of the weights of three pairwise edges and the weight of a triadic edge, i.e.,
and We consider mutual information and conditional entropy . Note that quantifies the amount of shared information between and , while is the remaining information (i.e., uncertainty) of given information of .^{2}^{2}2Direct estimations of these two measures require a large number of samples when the domain spaces of the random variables are large. We simplify the domain spaces of
and by binning them using the functionFigures 5B and E show negative correlations 0.33 and 0.3 between the mutual information and performance gain, and similarly, Figures 5C and F show positive correlations 0.67 and 0.6 between the conditional entropy and performance gain. These results imply that the gain from to is large when 3pg is difficult to be explained in terms of 2pg. We find that the conditional entropy has greater absolute correlations compared to the mutual information. A possible explanation is that the conditional entropy directly quantifies the information gain while the mutual information focuses on the shared information of 2pg and 3pg.
6. Discussion and conclusion
In this paper, we studied how much abstraction of group interactions is needed to accurately represent a hypergraph, with hyperedge prediction as our downstream task. We devise the projected graph to capture the way interactions in a hypergraph, and express the hypergraph with a collection of projected graphs. We investigate the performance gain as grows. We conclude that small is sufficient due to the diminishing returns, and higher acts as a troubleshooter in difficult task settings. We provide interpretations why different datasets have different gains. In summary, we investigate 1) how much, 2) when, and 3) why higherorder representations provide better accuracy. We expect that our results would offer insights to relevant works that follow. We leave our source code at (appendix).
Comments
There are no comments yet.