Graphs cover a wide range of applications, but there are domains in which an ordinary graph would fail to capture the relations of entities. Consider a research community, where authors publish papers in groups of more than two. It would involve information loss to represent such groups of collaborators as just pairwise edges as in an ordinary graph. Such interactions are effectively captured by hyperedges, an extended notion of edges that join an arbitrary number of entities. Graphs with hyperedges, referred to as hypergraphs, are everywhere in our offline/online networks. People gather in groups (sinha2015MAG), biological phenomena are caused by joint protein interactions (navlakha2010power), and web posts contain tags (zhang2019language; ofli2017saki).
One of the critical issues in playing with hypergraphs is how to process, simplify, and represent higher-order interactions for a given task. One may make a highly abstract representation of complex multi-way interactions, e.g., (zhang2018beyond; yadati2018link; benson2018simplicial; xu2013hyperlink; li2013link), while others may use the original hypergraph as it is, e.g., (sharma2014predicting; arya2018exploiting; huang2015scalable; yadati2018hypergcn; feng2019hypergraph; benson2018sequences). Despite the recent advances in processing units and memory devices for high-performance data processing, it is still daunting and computationally intractable to maintain whole group interactions in large-scale hypergraphs and use them for solving a given task.
We are motivated by such a reality and ask the following question: How much abstraction of group interactions is sufficient in solving a given graph task, and how different such results become across datasets that vary in scale, entities, and pattern of interactions? The answers to this question would give us useful engineering guidelines on how to appropriately trade off between complexity in representation of higher-order group interactions and accuracy of solving the task. In seeking to answer this question, we may find a new method that outperforms existing algorithms in literature while maintaining computational tractability. In this paper, we consider the hyperedge prediction task, which is a hypergraph extension of link prediction. Link prediction is a widely accepted means of assessing the validity of graph models (liben2007link; lu2011link; grover2016node2vec; zhou2017scalable; santolini2018predicting; you2019position; grover2019graphite).
As an important device to answer our question by solving the hyperedge prediction task, we introduce the -projected graph, , for a given hypergraph . This is a modified version of the original hypergraph so as to contain -way group interactions. By incrementally stacking -projected graphs, we can represent the original hypergraph with up to -way interactions. As expected, as grows, we reduce the information loss, but the computational cost for processing increases. The notion of projected graphs is not entirely new as adopted in (zhang2018beyond; yadati2018link; benson2018simplicial; sharma2014predicting). However, it has been limited to the pairwise relation, which turns out to be the -projected graph, a special case of the -projected graph. We generalize this pairwise relation to -way interactions in to quantify and decompose the degree of interactions, constructed as follows: Each edge in is weighted by the number of hyperedges in which the node set of size have appeared together (see Section 4 for details).
The value of the -projected graph is clear by the following example: Suppose that we want to predict whether four people would collaborate or not in the future. It is useful to know how much each pair has collaborated together as in -projected graph. However, collaboration is often formed because a group of three people, who have collaborated as a group, may recruit a fourth person in the future, where -way interaction becomes valuable.
We conduct experiments using 15 datasets spanning 8 domains provided by (benson2018simplicial; stehle2011high; mastrandrea2015contact; yin2017local; leskovec2007evolution; fowler2006connecting; fowler2006cosponsorship; sinha2015MAG)
. These datasets are highly heterogeneous in terms of scale, pattern of interactions, and interacting entities, ranging from about 1,000 to 2,500,000 hyperedges. We use logistic regression for prediction, where we utilize the features popularly used in link/hyperedge prediction tasks but generalized for-projected graphs. The prediction results would change for different methods, but we experience similar trends. We summarize the key findings of our experiments in what follows:
Diminishing returns. We systematically analyze the gain of approximating a hypergraph with increasing orders of . Particularly, we find that small orders of are enough to achieve comparable accuracy with near perfect approximations.
Troubleshooter. As we explore the outcomes in possible variations of the task, we discover that higher-order helps more in more challenging variations.
Irreducibility. We search for theoretical interpretations as to why the benefit of higher is greater in some datasets. These are datasets whose higher-order relations share little information with pairwise relations, thus cannot be reduced to pairwise.
Our source code and appendix are available online at (appendix).
2. Related work
Hypergraphs have been used in various domains, including social networks (tan2014mapping; yang2019revisiting), text retrieval (hu2008hypergraph), recommendation (bu2010music; zhu2016heterogeneous)fatemi2019knowledge), bioinformatics (klamt2009hypergraphs; hwang2008learning), e-commerce (li2018tail)huang2015learning; chen2009efficient) and circuit design (ouyang2002multilevel; karypis1999multilevel). Learning tasks based on hypergraphs include clustering (zhou2007learning; agarwal2005beyond; karypis2000multilevel; huang2015scalable), classification (yadati2018hypergcn; feng2019hypergraph), and hyperedge prediction (zhang2018beyond; yadati2018link; benson2018simplicial; xu2013hyperlink; li2013link; sharma2014predicting; arya2018exploiting; benson2018sequences).
To represent hypergraphs in an abstract manner, one method is to perform dyadic projection, also known as the clique expansion, reflecting two-way node relationships. This leads to usage of powerful tools such as spectral clustering(zhou2007learning). Clique averaging is a similar method (agarwal2005beyond) which assigns edge weights differently. karypis2000multilevel create successively coarser versions of a hypergraph for partitioning. The category of using hypergraphs without modification includes star expansion (agarwal2006higher) that connects each node in a hyperedge to a new node that represents a hyperedge. There are works that directly use hypergraphs with the idea of two resilient distributed datasets (RDDs) (huang2015scalable)
and deep learning approaches(yadati2018hypergcn; feng2019hypergraph).
Representation in hyperedge prediction. We now focus on prior works on hyperedge prediction. There are works that handle hypergraphs just with pairwise relations. zhang2018beyond project a hypergraph into a dyadic graph and uses its adjacency matrix for factorization. yadati2018link propose a deep learning approach with a 2-projected graph as the input. benson2018simplicial compare the performances of various features from the 2-projected graph to predict the co-occurrence of node triples. xu2013hyperlink learn representations for the distance matrix constructed from dyadic hops. li2013link rank hyperedges according to the proximity between two users. Another array of research apply hypergraphs as they are, implying the importance of using higher-order interactions. sharma2014predicting claim that 2-projected graphs fail to capture higher-order relationships. arya2018exploiting represent the whole hypergraph as the matrix of a star-expanded graph (agarwal2006higher) and formulate hyperedge prediction as a matrix completion problem. benson2018sequences operate on the sequence of sets, a timestamped representation of hyperedges, to generate the next timestamp hyperedge.
In this paper, we propose a parameterized representation framework that generates the entire spectrum of projected graphs and study the impact of the degree of simplification. An additional benefit of -way decomposition as in our -projected graphs is that each degree allows a certain form of uniformity, which enables us to enjoy computational amenity and mathematical tractability at each . Such benefits are verified in other contexts by (kolda2009tensor; shashua2006multi; bulo2009game; ghoshdastidar2017uniform; lin2009metafac).
3. Problem formulation
In this section, we formulate the problem of hyperedge prediction (Sections 3.1 and 3.2), which serves as a tool to evaluate the accuracy of hypergraph abstractions, and introduce possible variations on the problem (Section 3.3).
3.1. Concepts: Hypergraphs
Let be a hypergraph where is a set of nodes and is a set of hyperedges. Each hyperedge represents a set of nodes that took interaction. We weight each hyperedge by the number of times occurrence, and each hyperedge has a positive weight .
3.2. Problem: Hyperedge prediction
The problem of hyperedge prediction is generally defined as: Given an hypergraph in which hyperedges have timestamps up to , predict the hyperedges that will appear from until a time point in the future. However, a common practice is to remove some hyperedges from a snapshot of a hypergraph and regard them as the ones in the future (grover2016node2vec; yadati2018link), since timestamps are unavailable in many real-world data.111Though the datasets in this paper are originally timestamped, we follow this practice. Furthermore, it is unnecessary to generate all the missing hyperedges from , since the extreme sparsity would lead to poor generalization (zhang2018beyond). Thus, we solve a standard binary classification problem (Problem 1), where we use to indicate the set of hyperedges remaining after some are removed from :
Problem 1 (Hyperedge prediction).
a candidate hyperedge set where
Decide: whether each subset belongs to where .
We divide into a set of positive hyperedges in and a set of negative hyperedges not in . That is, , while
. Then, the objective is to find a classifierthat is close to the perfect classifier , where for and for .
3.3. Constructing hyperedge candidate set
There are different ways of constructing the candidate set . We thoroughly examine different choices of since experiments on a single choice could be biased for that particular case.
Hyperedge size. We consider three cases where each candidate has cardinality , , and , respectively. For each size, we systematically analyze the effect of higher-order interactions.
Negative hyperedges. While positive hyperedges can be collected simply by removing a certain proportion of , negative hyperedges need to be generated from If the nodes in each negative hyperedge are independently sampled, the resulting hyperedge will be unlikely to occur (e.g., total strangers are very unlikely to collaborate), making classification trivial. To avoid this situation, we select nodes that form stars or cliques in the pairwise projected graph as negative hyperedges. From the pairwise perspective, nodes that form a clique are more strongly tied and thus more likely to from a hyperedge than those which form a star. Thus, the task becomes more challenging when is generated from cliques.
Class imbalance. Now that we have considered the quality of negative hyperedges, we turn our attention to their quantity: how large should be? Since only a few form hyperedges among all possible node sets, it is natural to make , imposing class imbalance. We set the class ratio to be 1:1, 1:2, 1:5, 1:10, and for some cases, 1:200. Larger imbalance adds more difficulty to finding all while being precise as not to falsely predict .
4.1. The n-order expansion
We propose a method of incrementally representing high-order interactions in a given hypergraph, namely the -order expansion. Each increment in the representation is given as the -projected graph (or -pg in short), which captures the interactions of nodes. We note that there could be other ways of extracting uniform-size interactions, but we choose the -pg since its graphical representation enables the adoption of various principled link prediction features that are widely acknowledged in literature (adamic2003friends; benson2018simplicial; liben2007link; grover2016node2vec). Furthermore, it is a generalization of the commonly-used pairwise projected graph, providing conceptual consistency.
Definition 4.1 (-projected graph).
The -projected graph of a hypergraph is defined as follows:
That is, each node in the -projected graph of a hypergraph is a size-(-) subset of nodes in , and each edge represents a size- subset of nodes in contained in at least one hyperedge in . The weight of an edge corresponds to the sum of weights of hyperedges in that contain the size- subset represented by the edge. In other words, the weight of each edge indicates how often the corresponding nodes interact as a group and thus how close they are as a group. Notice that the pairwise projected graph is a special case of the -projected graph where . Figure 1 gives a visual description.
Based on -projected graphs, we define the -order expansion, our proposed way of incrementally approximating a hypergraph.
Definition 4.2 (-order expansion).
The -order expansion of a hypergraph is a collection of -projected graphs where varies from to . That is,
As increases, the -order representation captures more information in , and if reaches its maximum, can be reconstructed from . In Section 5, we experimentally study the value of marginal information gain (quantified by prediction accuracy) for each in the -order expansion in hyperedge prediction.
4.2. Prediction model
In this subsection, we describe the features and classifier that we use for hyperedge prediction.
Features. The -order expansion of a hypergraph returns a series of -projected graphs, from each of which we extract one among six features. Let be the set of neighbors of the node in the -projected graph , and for each subset of nodes in , let be the set of “inner” edges in that represent a subset of
. Then, we use the following features extractable fromfor each hyperedge candidate :
The first three measures (GM, HM, and AM) are the geometric, harmonic, and arithmetic means of inner edge weights in the -projected graphs. These features are reported to work well in the task of predicting triangles in -projected graphs (benson2018simplicial). For the other three measures (CN, JC, and AA), we extend well-known pairwise link prediction features (newman2001clustering; salton1983introduction; adamic2003friends) to larger groups of nodes.
When the input hypergraph is represented in the form of the -order expansion , the features obtained in different projected graphs are concatenated. That is, the features of a subset of nodes obtained from are .
Classifier. We use the above features as the inputs to a logistic regression classifier with L2 regularization, which has been used widely for link and hyperedge prediction (grover2016node2vec; benson2018simplicial; liben2007link)
. Although complicated classifiers with more parameters, such as deep neural networks, could be used instead, their performance has higher variance and depends more heavily on hyperparameter values. We decide to use the simple classifier to provide stable comparisons of different orders of approximation.
In this section, we present our experimental results to address our questions on the impact of higher-order interactions in the form of -projected graphs (or simply -pgs throughout this section).
We start by explaining our datasets and the experimental setup, followed by our results in each of subsections.
Datasets. We use 15 datasets generated across 8 domains from (benson2018simplicial), available at https://www.cs.cornell.edu/~arb/data/. The numbers of edges and hyperedges in them are summarized in Table 1. Hyperedges in each domain are defined as follows: (a) Email (email-Enron (klimt2004enron), email-Eu (yin2017local)): recipient addresses of an email, (b) Contact (contact-primary-school (stehle2011high), contact-high-school (mastrandrea2015contact)): persons that appeared in face-to-face proximity, (c) Drug components (NDC-classes, NDC-substances): classes or substances within a single drug, listed in the National Drug Code Directory, (d) Drug use (DAWN): drugs used by a patient, reported to the Drug Abuse Warning Network, before an emergency visit, (e) US Congress (congress-bills (fowler2006cosponsorship)): congress members cosponsoring a bill, (f) Online tags (tags-ask-ubuntu, tags-math-sx): tags in a question in Stack Exchange forums, (g) Online threads (tags-ask-ubuntu, tags-math-sx): users answering a question in Stack Exchange forums, and (h) Coauthorship (coauth-MAG-History (sinha2015MAG), coauth-MAG-Geology (sinha2015MAG), coauth-DBLP): coauthors of a publication. We only consider hyperedges containing at most 10 nodes. It is reported that large hyperedges are rare and less meaningful (benson2018simplicial). As mentioned in Section 3.2, the datasets are timestamped but we treat them as weighted hypergraphs with unique hyperedges.
Training and evaluation. For each target hyperedge size (4, 5, 10), we generate positive hyperedges by randomly removing hyperedges until of all hyperedges or no hyperedges with the target size are left. We randomly sample the sets of nodes that form cliques and stars in 2-pg as negative hyperedges until a certain multiple of positive hyperedges ( 1, 2, 5, 10, 200) are gathered (see Section 3.3 for details). The positive and negative hyperedges are combined to form the candidate set, i.e., . The candidate set is split into train and test sets. We evaluate classification performance by the area under the precision-recall curve (AUC-PR) (davis2006relationship), a measure sensitive to class imbalance.
|Size 4 prediction||Size 5 prediction|
|(lr)2-7 (l)8-19||2 to 3 gain (%)||2 to 3 gain (%)||3 to 4 gain (%)|
|(lr)2-7 (lr)8-13 (l)14-19 Dataset||GM||HM||AM||CN||JC||AA||GM||HM||AM||CN||JC||AA||GM||HM||AM||CN||JC||AA|
|(r)1-1 (lr)2-7 (lr)8-13 (l)14-19 Average||-0.90||0.46||23.60||141.59||67.11||134.41||5.43||3.16||58.73||229.54||58.47||220.85||7.36||11.69||3.88||0.15||10.62||-0.13|
5.2. Results and messages
In this subsection, we present our results by summarizing them with three main messages.
(M1) More higher-order information leads to better prediction quality, but with diminishing returns.
We investigate how the prediction performance changes with increasing in -order expansions. In particular, we predict hyperedges of size 4 with the features from 2 and 3-order expansions, and hyperedges of size 5 with features from 2, 3, and 4-order expansions. Table 2 summarizes the results, and for readers’ convenience, we also plot the performance averaged across all features in Figure 2.
We clearly observe that higher-order expansion gives better performance, where the improvement quantity differs across datasets and features. Performance gaps from to , averaged across features, are 61% for size 4 and 94% for size 5, respectively, whereas it is just 6% for size 5 from to As for the individual features, we see that the gain is larger with neighborhood-based features (CN, JC, AA) than with mean-based features (GM, HM, AM). The mean values are small in higher-order pgs, while neighborhood-based features still retain meaningfully large values. Entries with exactly zero gain (contact datasets) result from the sparsity of size 5 hyperedges (i.e., ). See more details in (appendix).
Interestingly, we see the diminishing returns as increases. To study this, we predict size 10 hyperedges with -order expansions for Figure 3 shows that, in most datasets, the performances tend to increase significantly from to , but marginally for . However, somewhat unexpectedly, we also find that some datasets experience a small jump (not as high as that from to ) from to (NDC-classes, coauth-MAG-History) or from to (NDC-substances, coauth-MAG-Geology, coauth-DBLP). We speculate that it is because knowing 8 or 9-way interactions is often more useful compared to knowing those between 4 to 7-way, for predicting size 10. For illustration, papers with 10 authors would be often made by a group of 9 existing collaborators’ invitation of another author.
(M2) More hardness of the task gives higher values to higher-order information.
As discussed in Section 3.3, we adjust the hardness level in hyperedge prediction by varying the negative set in terms of negative hyperedge types (stars and cliques) and class imbalances (1:1, 1:2, 1:5, 1:10). We investigate the impact of those variations on the performance gain, summarized in Figure 4. The axis represents the types of negative hyperedges (stars or cliques), extracted from 2-pg, and the axis represents different class imbalances.
Regarding the types of negative hyperedges, we see that the gains from to are larger for cliques than stars. As explained in Section 3.3, distinguishing whether a clique is a true hyperedge or not is a much harder task compared to a star. The troubleshooter is 3-pg, that is, to refer to higher-order information. On the impact of class imbalance, again the gain from to is larger, as class imbalance grows. More negative hyperedges imply the increasing hardness in obtaining better precision, while maintaining the same sensitivity, and there are more negative hyperedges that resemble positive hyperedges. In such cases, incorporating 3-pg in addition to 2-pg, provided that it gives more information, helps distinguish fake samples better. Note that in GM, there are reversed tendencies. This is explained by the property of GM (i.e., geometric mean defined in Section 4.2) that a pairwise disconnection makes , i.e., strict stars are easily filtered with only 2-pg, since .
(M3) Higher-order information helps more, when (i) higher-order interactions are more frequent, and (ii) higher-order interactions share less information with pairwise ones.
We observe different performance gains across different datasets in the two prior messages. For example, in Table 2, the average gain from to was about 284% in contact-primary-school, while it was merely 6.5% in NDC-classes. We now delve into why they do, for which we measure various statistics, including the edge density and information-theoretic values of some pgs.
(a) Edge density in 3-pg. We first examine edge density in 3-pg, i.e., . This intuitively quantifies the abundance of 3-way interactions. We measure the Pearson correlation coefficient between edge density and performance gain, as shown in Figures 5-A and D. We observe positive correlations between those two, 0.22 and 0.09 for size 5 and size 4 predictions, respectively, implying that more frequent higher-order interactions let higher-order representation lead to better prediction.
(b) Mutual information and conditional entropy.
We next study the aforementioned observation with information-theoretic measures. We expect that adding 3-pg would have larger returns when it contains more information exclusive to itself. We set two joint random variablesand generated by three different nodes sampled
uniformly at random, representing a vector of the weights of three pairwise edges and the weight of a triadic edge, i.e.,and We consider mutual information and conditional entropy . Note that quantifies the amount of shared information between and , while is the remaining information (i.e., uncertainty) of given information of .222
Direct estimations of these two measures require a large number of samples when the domain spaces of the random variables are large. We simplify the domain spaces ofand by binning them using the function
Figures 5-B and E show negative correlations -0.33 and -0.3 between the mutual information and performance gain, and similarly, Figures 5-C and F show positive correlations 0.67 and 0.6 between the conditional entropy and performance gain. These results imply that the gain from to is large when 3-pg is difficult to be explained in terms of 2-pg. We find that the conditional entropy has greater absolute correlations compared to the mutual information. A possible explanation is that the conditional entropy directly quantifies the information gain while the mutual information focuses on the shared information of 2-pg and 3-pg.
6. Discussion and conclusion
In this paper, we studied how much abstraction of group interactions is needed to accurately represent a hypergraph, with hyperedge prediction as our downstream task. We devise the -projected graph to capture the -way interactions in a hypergraph, and express the hypergraph with a collection of -projected graphs. We investigate the performance gain as grows. We conclude that small is sufficient due to the diminishing returns, and higher acts as a troubleshooter in difficult task settings. We provide interpretations why different datasets have different gains. In summary, we investigate 1) how much, 2) when, and 3) why higher-order representations provide better accuracy. We expect that our results would offer insights to relevant works that follow. We leave our source code at (appendix).