Love tHy Neighbour: Remeasuring Local Structural Node Similarity in Hypergraph-Derived Networks

10/30/2021
by   Govind Sharma, et al.
0

The problem of node-similarity in networks has motivated a plethora of such measures between node-pairs, which make use of the underlying graph structure. However, higher-order relations cannot be losslessly captured by mere graphs and hence, extensions thereof viz. hypergraphs are used instead. Measuring proximity between node pairs in such a setting calls for a revision in the topological measures of similarity, lest the hypergraph structure remains under-exploited. We, in this work, propose a multitude of hypergraph-oriented similarity scores between node-pairs, thereby providing novel solutions to the link prediction problem. As a part of our proposition, we provide theoretical formulations to extend graph-topology based scores to hypergraphs. We compare our scores with graph-based scores (over clique-expansions of hypergraphs into graphs) from the state-of-the-art. Using a combination of the existing graph-based and the proposed hypergraph-based similarity scores as features for a classifier predicts links much better than using the former solely. Experiments on several real-world datasets and both quantitative as well as qualitative analyses on the same exhibit the superiority of the proposed similarity scores over the existing ones.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

10/30/2021

The CAT SET on the MAT: Cross Attention for Set Matching in Bipartite Hypergraphs

Usual relations between entities could be captured using graphs; but tho...
06/15/2021

Hypergraph Dissimilarity Measures

In this paper, we propose two novel approaches for hypergraph comparison...
05/11/2020

Hypergraph Learning with Line Expansion

Previous hypergraph expansions are solely carried out on either vertex l...
10/25/2019

Manipulating Node Similarity Measures in Network

Node similarity measures quantify how similar a pair of nodes are in a n...
02/10/2020

Analyzing, Exploring, and Visualizing Complex Networks via Hypergraphs using SimpleHypergraphs.jl

Real-world complex networks are usually being modeled as graphs. The con...
01/25/2019

Topological and Semantic Graph-based Author Disambiguation on DBLP Data in Neo4j

In this work, we introduce a novel method for entity resolution author d...
04/26/2022

Generating Topological Structure of Floorplans from Room Attributes

Analysis of indoor spaces requires topological information. In this pape...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Measuring similarity between nodes of a graph has attracted the attention of network science researchers in all domains, be it social Gou et al. (2010), biological Bass et al. (2013), bibliographic Sun et al. (2011), or entertainment Kwon et al. (2012). One simple reason why similarity between two nodes is important is to make a decision as to whether two seemingly unconnected nodes should be connected or not – a problem more popularly known as recommendation Fouss et al. (2007). While the notion of similarity between two nodes is fairly intuitive when the underlying relational structure of the network is graph-like (i.e., edges connect two nodes), it is a different ball game altogether when it is not. More specifically, if the underlying relational structure of a network involves more than two entities in a single relation, the usual graph paradigm becomes lossy. Moreover, it is quite unclear how close or similar two nodes would be in the presence of “edges” of higher sizes. To make these two points clearer, let us divert our attention to Figure 1(a).

(a) Hypergraph (b) Derived network
Figure 1: A toy example showing the genesis of a co-authorship network from its originally occurring hypergraph (i.e., its higher-order counterpart).

We see five authors who are related to each other by co-authorship, which by nature possesses a higher-order property in that more than two authors can write a publication together. In this case, we see three co-authorship groups: , , and , each corresponding to a collaboration between the respective authors. Our first point – that we lose information when we pose these relations as pairwise – is clear from Figure 1(b), which is a graph-induced version of the network. Some illustrative pieces of information that the graph on the right loses are: (1) How many papers were written to start with? (2) Who all collaborated with each other? (3) Which author has the tendency to collaborate with larger teams? Secondly, while it makes sense about how close or far away two nodes are to each other in a graph (say, authors and in Figure 1(b)), the same is not true for the relational structure on the left (Figure 1(b)). The problem increases when we are asked to find the similarity between two nodes in such an scenario. As would be shown later, standard measures of node similarity do not transfer directly from the graph-domain to hypergraphs Berge (1984); Bretto (2013) – so are structures capturing higher order relations called.

Multiple authors authoring a paper together, or multiple actors working on a movie project, or multiple proteins interacting with each other, etc. are examples of such networks, and are better represented using hypergraphs instead. Hypergraphs are composed of arbitrary-sized edges called hyperedges, which losslessly capture information from higher-order relations. While most of the literature on node-similarity is focused on graphs, we deal with the same problem for hypergraphs111Please note that by “node similarity on hypergraphs”, we still refer to pairs of nodes (links/edges), not hyperlinks/hyperedges.

. Moreover, any given hypergraph could be converted into a graph (although lossily) by simple heuristics,

weighted/unweighted clique-expansion Zhou et al. (2006); Agarwal et al. (2006) being one of them. We call the resulting network a hypergraph-expanded network or hypergraph-derived network, that is, any network having an underlying hypergraph structure, i.e., higher-order relations between vertices. The example in Figure 1(b) is such a network.

It is well-known that local similarity measures (e.g., common neighbors Newman (2001)) prove to be powerful measures of node similarity, either by themselves Liben-Nowell and Kleinberg (2003), or as classifier features Al Hasan et al. (2006). However, while computing them, the underlying hypergraph structure remains hidden. This deprives the algorithm of any extra information such higher-order structure could have contributed towards. We exploit the underlying hypergraph structure and extend popular local (i.e., neighborhood-based) similarity measures to their higher-order versions.

In the present work, we focus on exploiting the topological properties of the underlying hypergraph of a derived network to aid the process of measuring the similarity between two nodes. We restrict ourselves to local neighborhood based methods and argue using both information theory as well as via explicitly predicting links using the similarity scores. Our experiments include performing experiments on both temporal as well as non-temporal datasets.

We first provide a generic formulation to convert any neighborhood based pairwise score to hypergraphs in Section 3. Then in Section 4, we describe procedures to carefully prepare data and hypergraph-oriented features so as to carry out experiments. In Section 6

, we perform experiments, that include both temporal as well as non-temporal datasets for the sake of completeness. We also compute mutual information scores to vouch for the relevance of our measures, and devise different feature combinations to test them in a supervised learning scenario. And finally, in Section 

7, we compare our AUC scores with that of the baselines.

Our Contributions

  1. We formulate a theoretically-backed novel technique to convert graph topology-based pairwise node similarity measures into hypergraph-topology-based ones.

  2. We extend the local neighborhood based node similarity scores to their hypergraph variants.

  3. We propose fair and unbiased novel data preparation algorithms, so that similarity computation could be performed on both temporal and non-temporal hypergraph networks.

  4. We improve the quality of structural similarity between nodes by incorporating hypergraphs and the scores we formulate.

2 Background and Notation

We define a hypergraph by , where denotes the set of its vertices/nodes and , its hyperedges. Its temporal variant, a temporal hypergraph is one wherein for each hyperedge , information about the time of its (first) occurrence is communicated via a function . A graph, on the other hand, is denoted by (with being its temporal variant), where . A hypergraph could be converted into a graph with defined as , a process known as clique expansion Agarwal et al. (2006). Refer to Table 1 for a quick reference to notations.

Symbol Definition
Set of nodes
-power set of a set
Set of hyperedges
Set of edges
Hypergraph
Graph
Incidence Matrix
Adjacency Matrix
Set of neighbors of a node
Set of hyperneighbors of a node
Table 1: Notations used in this article

3 Formulating Similarities

Social scientists have long been involved in finding metrics that describe relations between entities in a network. It was Liben-Nowell, et al. Liben-Nowell and Kleinberg (2003) who first formally accumulated topological similarity scores from the network science literature and showed that they are good measures by themselves. These similarity functions range from the earliest works of Katz Katz (1953) and Adamic, et al Adamic and Adar (2003) to the then recent works by Newman, et al. Newman (2004); and till date, topological similarity computation features have witnessed several advancements Martínez et al. (2016). Nevertheless, though there have been several works pertaining to hypergraphs and their applications, no work in the literature (with exceptions to some works Li et al. (2013), which use uniform/heterogeneous hypergraphs) utilizes usual (non-uniform, homogeneous) hypergraphs for similarity computation.

In this section, we set out to formally extend similarity computation scores in the literature from the graph- to the hypergraph-domain. For the same, we define an end-to-end process of carefully constructing such scores from existing graph-topology based ones. We first generalize graph-based (esp., local neighborhood based) scores via defining set-similarity functions taken from well-known local similarity computation paradigms, and then extend them to graphs (which is a usual, adjacency-based topology) and finally to a hypergraph (incidence-based) topology.

Our ultimate goal is to be able to predict links between an unlinked pair of nodes given a hypergraph . But in the literature, we find a plethora of techniques that make use of the existing graph structure, given graph . Most such techniques are set-based, in that they take sets corresponding to the nodes in consideration, and assign a prediction score to the pair .

3.1 The Case of Common-Neighbors

For a pair of nodes , the common-neighbors (CN) technique takes sets and to be neighbors of and respectively, and computes the cardinality of their intersection. In essence, CN makes use of two major concepts: “neighborhood”, and “intersection”. Now, if one were to compute the hypergraph-equivalent to CN, one would have to use a concept equivalent to “neighborhood”. A simple option would be to consider “hyperneighborhood” instead (ref. Table 1). In other words, the hypergraph equivalent of CN for could be defined as the number of hyperedges incident on both and ; but since the nodes are unlinked in the first place, there would be no common hyperedges! Thus, this option fails. However, we still want to use the “common-neighbor” paradigm on hypergraphs.

Hence, first consider pairs of hyperedges , one incident to () and the other to (), and count their “intersection” . And since and are precisely the hyperneighbors () of and , we have combined the two concepts of “hyperneighborhood” and “intersection”, and thus, have extended CN to hypergraphs. But since each choice of would give a number , we would have an matrix of intersection-counts. A suitable matrix norm could then be used to convert this matrix into a single number, which would be used as a feature for similarity computation.

Extending this formalization to all local-neighborhood Guns (2014) based similarity computation scores is the ultimate goal of this section. In order to do that, we define the notions of a set similarity function , a node similarity function , and a node-similarity matrix-function . For ease of transference of any set-similarity notion to node pairs, we also define two functionals (functions that map functions to functions): an adjacency functional (to use set-similarity functions as similarity computers in graphs) and an incidence functional (to use them as similarity computers in hypergraphs). An intermediate concept: incidence matrix-functional has to be defined, so that it could be composed with a matrix norm to obtain incidence-based node-similarity measures.

3.2 Extending Similarities to Hypergraphs

Given a hypergraph , for each vertex pair , we define functions that quantify the proximity between vertices and .

Common Neighbor,
Jaccard Coefficient,
Association Strength,
Cosine Similarity,
NMeasure,
MinOverlap,
MaxOverlap,
Adamic Adar,
Pearson Correlation,
Preferential Attachment,
Table 2: Set Similarity Functions for a graph . Here, and represents the neighbors of a node .

Define a set similarity function as as a function that assigns to an unordered pair of vertex sets, , a real number corresponding to a measure of similarity between the sets. A list of known set-similarity functions has been included in Table 2. Let represent all set-similarity functions over . We then define a node similarity function as that assigns to a pair of nodes , a similarity score . Let denote the set of all node-similarity functions over . At this point, we also define an adjacency functional that maps each set-similarity function to a node-similarity function defined as .

In order to extend this successfully to hypergraphs, we define a node-similarity matrix-function as to be one that assigns to a pair of nodes , multiple similarity scores arranged in a real matrix . Let the set of all such functions for be denoted by , where denotes the set of all real-valued finite-dimensional matrices. Then for a hypergraph, we define an incidence matrix-functional that maps each set-similarity function to a node-similarity matrix-function defined as

As discussed above, multiple matrix norms could be used to convert this matrix to a real number. Some of them are: (i) Max-norm: , (ii) Avg-norm: , (iii) L1-norm: , and (iv) L2-norm: .

Finally, the composition of a matrix norm with the incidence matrix-functional forms an incidence functional, defined as , which ultimately gives a functional , defined as , mapping pairs to incidence-based similarities .

Of the functionals defined above, the adjacency and the incidence functionals make use of neighbors and hyperneighbors respectively to transfer set-similarity functions to node-similarity in graphs and hypergraphs respectively.

3.2.1 Illustration with Common Neighbors

For the sake of further clarity, we demonstrate the case for Common Neighbors. We first pick (as defined in Table 2) as the set similarity function. The adjacency functional maps to a node similarity function , defined by , the usual common-neighbor criterion for similarity computation in graphs. Moving to the incidence (hypergraph) domain, we first use the incidence matrix-functional to map to a node-similarity matrix-function . If and , then is a matrix whose entry would be . If a matrix norm such as is chosen, we get the incidence functional, , that gives us the node similarity function as

where .

3.3 Sanity Check for Hypergraph-based Similarities

For the sake of establishing the sanity of the recently developed mechanism, we have the following lemma.

Lemma 1.

The Common-Neighbor set similarity function , when used to define an incidence-based node similarity function , for a graph , assigns to each pair , a similarity score that is proportional to a constant power of the original score. That is,

(1)

for at least one matrix norm , and for some scalars .

Proof.

Suppose we have as a usual undirected graph. For nodes s.t. , if and , we get hyperneighbors

(2)
(3)

Similarly,

(4)

Now, if we take the matrix norm to be L1 (), we have:

(5)

Taking different matrix norms, we get scores as shown in the table below (Table 3).

Norm
Table 3: Similarity scores between and () when hypergraph is actually a graph.

It could be observed that when is used as matrix norm, becomes the same for both hypergraphs and graphs (i.e., ). Scores from the other norms act as extra features that we get as a result of the “incidence matrix” interpretation of a graph. The same procedure when repeated for , , and other similarity scores gives us either the same graph score, or a scalar multiple of a power of it. ∎

Note: It needs to be understood that complete equality is not required, since ultimately, we use graph features along with hypergraph ones (macro/micro combinations for GH, WH, etc.). Also, the hypergraph based scores act as new features that come from the incidence matrix interpretation of the hypergraph (even if it is a graph).

4 Methodology

4.1 Data Preparation and Preprocessing

Input :  Hypergraph
Split ratio
Output :  Train hyperedges
Test links
1 for  do
2       for  do
3             if  then
4                  
5            
6      
sample(, ) cleanHyperedges(, return ,
Algorithm 1 structuralSplit(, ) for hypergraph data
Input :  Set of hyperedges
Set of edges
Output :  Cleaned-up hyperedges
1 for  do
2      
3for  do
4       for  do
5             for  do
6                   if  then
7                        
8                  if  then
9                        
10                  
11            
12      
return
Algorithm 2 cleanHyperedges(, ) to remove edge-information in from hyperedges

Given a hypergraph (temporal or non-temporal), we need to convert it into a form that is consumable in the similarity computation setting, so that we are able to calculate both graph- and hypergraph-based features readily. We prepare data separately for temporal and structural similarity computation settings.

4.1.1 Temporal Processing

In the temporal setting, we have a timed hypergraph, with us. Unweighted and weighted clique expansions of give and respectively.

In short, we have timed graphs and with us now. Now, a split-ratio parameter is selected. The graph timeline (image of under ) could be defined as a set of time-stamps for which, w.l.o.g., let . If a time-threshold index is now defined as , we could divide the timeline into train- and test-periods222Please note that the terms ‘train period’ and ‘test period’ is akin to the formulation by Liben-Nowell, et al. Liben-Nowell and Kleinberg (2003) and should not be confused with train and test datasets in supervised learning. and respectively. This leads us to edge sets and , denoting edges formed in the train and test periods respectively. Similarly, we define as the hyperedges that were formed in the train period.

4.1.2 Structural Processing

Structural processing is easier than temporal, since hyperedges are non-timed. Formally, we start with a hypergraph , which gets converted into graphs and as before. A similar split-ratio is selected, and if , we randomly delete number of edges from the graph, which has to be predicted later. In other words, a random sample is selected such that . As a result, we get the set of test edges .

We now discuss the preparation of the train hypergraph, whose topology would be used while predicting links. In the temporal case, we simply ignored hyperedges from the test period and what remained was . But here, we have no temporal information, and the train-test split is done at random, which successfully separates from , but not from , since there are no well-defined concepts of “train period” or “test period” here.

Let us analyze the situation closely. Before continuing further, let us extend the hyperneighborhood function to edges: defined as . The question is: which hyperedges should be included in the train set so that information from them could be used while predicting test links ?

Choosing all hyperedges as would trivialize the very task of similarity computation and we would end up predicting all links with a 100% accuracy using only one feature: “common hyperneighbors”! And, on the other hand, using only those hyperedges that are not supersets of any test edge, i.e., would deprive us of many links that a “hyperedge minus a test edge” would have otherwise provided. We go with neither of the options and choose to “strip” each test edge off of a potential train hyperedge. A detailed procedure has been described in Algorithm 1, which in turn uses Algorithm 2 to clean away information about any test edge from the hyperedges, finally giving us a rich train hypergraph for similarity computation.

Finally, we get train hypergraph , and test edges . The similarity computation problem would be to predict new links (i.e., those not already present in ) using information from the hypergraph topology ; predictions will later be evaluated using test set .

4.2 Computing Graph Features

We had earlier listed certain set-similarity functions in Table 2. Let use take the corresponding link predictors (let us call them base predictors) from the literature Liben-Nowell and Kleinberg (2003); Guns (2014) and hence get ten different similarity computation scores for each pair of nodes in a given dataset. More specifically, we take the adjacency node similarity function where and are used as per Section 3 (where base predictor, AA, JC, AS, CS, NM, MnO, MxO, AA, PC, PA) to find scores for each pair. We repeat this exercise for the edge-weighted version of the graph (using weighted scoring functions defined in Guns (2014)). Finally, for each hypergraph, corresponding to each base predictor, we have two different graph-based topological scores per node pair, which we denote by G (for unweighted graph) and W (for weighted graph) respectively.

4.3 Computing Hypergraph Features

Similar to Section 4.2, also compute scores for the hypergraph-variations of the base-predictors. This involves computing the node-similarity matrix-function for each of the set similarity functions mentioned in Table 2, followed by the application of the four matrix norms defined earlier to obtain a single numeric score for each pair. In summary, we compute the incidence node similarity function via , where , , and are as defined earlier, and Table 2. For each hypergraph, corresponding to each base predictor, we have four different hypergraph-based topological scores per node pair, which we denote by Hm, Ha, H1, and H2.

5 Related work

Computing similarity scores has a vast literature, and covering it in whole is beyond the scope of the present work. The reader is redirected to some excellent review works Wang et al. (2015); Lu et al. (2010); Martínez et al. (2016), which provide an intelligible coverage of the similarity computation ecosystem. Although the concept wasn’t new to network scientists, and there have been vintage works on predicting new relations in networks (Katz (1953)), the first formal work on similarity computation could be credited to Liben-Nowell, et al. Liben-Nowell and Kleinberg (2003). They brought together multiple similarity scores to solve the problem, scores both new and existing Katz (1953); Adamic and Adar (2003); Newman (2001). Even since, many interesting directions to solve the similarity computation problem in networks were taken.

However, almost all works that use hypergraph networks (with the exception of Li et al. Li et al. (2013), who deal with heterogeneous, uniform hypergraphs only) do not consider the underlying hypergraph structure after the network gets expanded to a graph. Recently, there has been interest in the areas of hyperlink (or simplex, or merely hyperedge) prediction as well, which acknowledges the fact that there is a loss of information when a hypergraph is converted to a graph Xu et al. (2013); Benson et al. (2018).

6 Experiments

6.1 Datasets

We use a multitude of hypergraph datasets, mainly from Benson, et al. Benson et al. (2018), from where we pick six datasets. A brief account of all of them is as follows:

  • email-Enron: In an organization (Enron Corporation), an email communication between employee nodes represents a hyperedge Klimt and Yang (2004).

  • contact-high-school: In a high-school setting, nodes represent school students, and a hyperedge is formed between individuals that are spatially close to each other at a given time instance Mastrandrea et al. (2015).

  • NDC-substances: Nodes signify chemical substances, and a hyperedge represents a set of these substances used in a particular drug.

  • tags-math-sx: Again, it is a dataset from the same mathematical forum as above, only that the nodes denote mathematical tags, and a hyperedge is formed over all tags that a particular question is associated with.

  • threads math-sx: Users on a mathematics discussion forum333https://math.stackexchange.com/ form nodes and a group of users involved in a particular question thread forms a hyperedge.

  • coauth-DBLP: Nodes represent authors and a hyperedge, a group of all authors that wrote a paper together.

Refer to Benson et al. (2018) for more details.

6.2 Preprocessing data and computing scores

We perform a lot of link-prediction experiments on a number of hypergraph datasets belonging to multiple real-world domains. Since data preparation is both a crucial step as well as one of our main contributions, it forms a major part in our methodology (Section 4.1) itself. We fix the split-ratio to be , and choose to randomly generate times as many negative samples (non-links) as positive samples (links). For each hypergraph, we perform both temporal and structual link prediciton (ignoring the time information for the latter). We get train hyperedges , test links , and test non-links as defined above.

For each pair , we compute the ten base predictor scores, as mentioned in Section 4.2, taking (both weighted and unweighted) as information for edges, hence preparing our baselines. Then, as explained in Section 4.3, we compute hypergraph-topology based scores that we have proposed. Towards the end, for each base predictor, we have a total of six different scores per node pair: graph (G), weighted-graph (W), hypergraph-max (Hm), hypergraph-avg (Ha), hypergraph-L1 (H1), and hypergraph-L2 (H2). And since there are a total of ten base predictors: AA, AS, CN, Cos, PA, JC, MxO, MnO, NM, and Prn, we finally get different scores per node pair.

6.3 Calculating Mutual Information

Mutual information Shannon (1948) has been shown to play a major role in similarity computation Tan et al. (2014). But we use it here in the classical sense, in that for each dataset, we find the mutual information score for each individual feature by binning its values via a log-binning (where consecutive bins are assigned on the base-10 log scale) mechanism since they are continuous values, with all of them being power-law distributed as opposed to normal. We monitored the MI scores for various number of bins and found that beyond a sufficiently large number of bins, the relative rank of the similarity computation features does not change. Hence, we fix the number of bins to be 2000.

6.4 Performing Link Prediction

Finally, we perform similarity computation in three different modes, which have been described as follows:

  1. Standalone features: In this mode, we simply use the predictor scores (G, W, Hm, Ha, H1, H2) calculated in Section 6.2 for similarity computation, i.e., predict links via the unsupervised similarity computation paradigm similar to Liben-Nowell et al. Liben-Nowell and Kleinberg (2003). At the end, we would have a total of 60 standalone scores. Although we did not expect to do better than the baselines in this mode, we still observe decent performances).

  2. Micro-feature combination: Here, we take various feature combinations, treating each of the ten base predictors separately. We have a total of five different feature combinations per base predictor: mic-G, mic-W, mic-H, mic-GH, mic-WH, where the first two correspond to singleton features G and W, and the last three to taking H individually, G and H together, and W and H together respectively (ref. Sections 4.2 and 4.3). In all, we have 10 5 = 50 micro feature combinations for each dataset.

  3. Macro-feature combination: This is similar to micro-feature combination, except all base predictors are taken together for each combination. That is, we take all graph-based features (mac-G), all weighted-graph-based features (mac-W), all hypergraph-based features (mac-H), and their combinations mac-GH and mac-WH. We have totally 5 macro-feature combinations for each dataset.

In case of micro and macro

modes, we learn an XGBoost 

Chen and Guestrin (2016) classifier to predict links (and get one classifier per feature combination), and in the standalone mode, the scores themselves are used as predictions. For the classification, we randomly split the prepared data further into train and test, this time for classification444Earlier, we had performed a train-test split in a temporal or a structural sense, which was a data preparation step. But here, the usual, supervised-learning oriented split of the prepared data into train and test has been performed. Once we have the predictions by a feature combination, for evaluating performance, the predictions are compared with the labels (link/non-link) and ROC curves Davis and Goadrich (2006) are derived, which are finally summarized using Area Under ROC (AUC).

7 Results and Discussion

We perform the experiments listed in the previous section on all the six datasets, all base predictors. For micro and macro modes, we get a total of 50 and 5 classifiers respectively (one per feature combination), and the same number of AUC scores, and for the standalone mode, we have 60 different AUC scores. Since owing to space limitations, it is difficult to show all the results here, we try our best to summarize all our experiments as best as possible using a handful of results. We run these experiments for a total of five

times, so as to monitor the variance across different runs, since each experiment has at least one random step,

viz., sampling of non-links.

Figure 2: Mutual information scores denoting importance of six features (for all ten base predictors) in classifying links vs. non-links, computed on the coauth-DBLP hypergraph (dataset F)

7.1 Mutual Information for Link Prediction

Treating each standalone score as a feature in a supervised setting, we compute their mutual information (MI) w.r.t. the positive (links) and negative (non-links) classes. For the dataset coauth-DBLP, we plot MI scores for both temporal and structural similarity computation for each base predictor. As could be observed, in the temporal case, except for AA, PA, and CN, where graph or weighted-graph MI outperforms the others, at least two hypergraph MI scores are better than the graph ones. This only means that hypergraph based scores have the potential to better explain links vs. non-links. We chose this dataset since it is the largest hypergraph we have used.

7.2 Micro Feature Combination Performances

As per the description of the micro feature combination mode in Section 6.4, we report AUC scores for the contact-high-school data in Table 8. It has to be interpreted as per various micro-feature combinations. As is clear from the highlighted numbers, except for Cos, JC, and MxO in the temporal similarity computation case (which perform best with mic-W), in all other cases, feature combinations involving hypergraphs (mic-H, mic-GH, mic-WH) work best.

mic-G mic-W mic-H mic-GH mic-WH
AA-s 93.00.7 92.80.8 93.30.6 93.40.5 93.40.6
AS-s 91.50.8 88.30.6 93.30.4 93.50.4 93.40.5
CN-s 93.00.7 92.60.9 92.90.3 93.20.4 93.20.4
Cos-s 92.90.8 93.00.6 93.10.3 93.20.4 93.50.5
PA-s 62.30.9 60.91.6 62.31.5 62.61.2 63.71.5
JC-s 92.80.5 92.80.4 93.10.3 93.30.2 93.30.3
MxO-s 92.60.4 92.50.4 93.20.4 93.30.4 93.30.3
MnO-s 92.60.9 91.50.6 93.00.2 93.30.7 93.10.3
NM-s 92.80.5 92.50.3 93.20.3 93.30.3 93.40.4
Prn-s 90.90.6 90.80.8 93.20.3 93.20.3 93.30.4
AA-t 86.32.4 86.72.4 87.32.0 87.41.8 87.92.2
AS-t 85.91.3 84.01.4 87.51.9 86.91.8 87.21.9
CN-t 87.32.0 86.81.8 86.41.9 86.82.1 87.42.1
Cos-t 87.51.6 88.12.0 87.41.8 87.31.9 87.52.1
PA-t 53.62.1 54.02.3 52.43.6 55.02.7 57.23.2
JC-t 87.51.9 88.41.9 87.51.8 87.41.7 88.01.8
MxO-t 86.91.9 87.71.3 87.41.5 87.21.6 87.51.3
MnO-t 86.91.2 86.61.5 86.62.2 86.51.9 87.71.8
NM-t 86.62.0 87.61.4 87.31.8 87.21.7 87.91.8
Prn-t 84.02.6 83.72.2 87.32.0 87.12.2 87.52.1
Table 4: AUC scores (%) for structural (-s) and temporal (-t) link prediction using micro-feature-combination via XGBoost for contact-high-school (i.e., dataset B). Row ids AA–Prn represent base predictors.
mic-G mic-W mic-H mic-GH mic-WH
A-s 3.60.5 5.00.0 3.20.7 1.40.5 1.80.7
B-s 4.00.4 4.90.3 3.00.4 1.70.5 1.40.4
C-s 4.00.2 5.00.2 2.30.5 1.80.2 1.80.2
D-s 4.30.5 4.70.5 2.60.4 1.30.5 2.10.4
E-s 4.20.2 4.80.2 1.90.3 2.00.2 2.00.2
F-s 3.10.3 3.20.6 3.00.0 2.80.4 2.80.4
A-t 4.80.4 4.10.5 3.00.4 2.00.4 1.10.3
B-t 3.71.1 3.01.5 3.31.2 3.61.1 1.40.5
C-t 3.30.6 3.30.6 2.80.3 2.80.3 2.70.6
D-t 4.30.5 4.50.9 2.80.7 2.10.5 1.20.5
E-t 4.20.7 4.70.4 2.20.5 2.10.5 1.80.6
F-t 3.20.6 3.10.3 3.00.0 2.90.3 2.80.6
Table 5: Rank-performances w.r.t. AUC scores from Table 8 across all datasets.

A similar trend could be seen from the rank-performance table of the micro mode (Table 5), where at least one combination involving H ranks higher than the rest in each row. As compared with the analysis in the standalone mode, where individual features were used, the micro mode gives better scores; more so, when hypergraph features are involved.

7.3 Macro Feature Combination Performances

Finally, partitioning the features as per the macro mode in Section 6.4 gives us a total of five feature combinations, all of whose performances have been listed in Table 6. The hypergraph based features perform much better with these feature combinations. Even though mac-H underperforms the last two columns, compared with the purely graph oriented feature combinations (mac-G and mac-W), except for B-s, B-t, and D-t, it performs better.

mac-G mac-W mac-H mac-GH mac-WH
A-s 93.101.00 93.061.05 93.890.41 93.900.52 94.100.54
B-s 93.400.46 93.540.41 93.460.65 93.590.72 93.680.44
C-s 98.770.12 98.730.12 98.870.11 98.880.10 98.890.15
D-s 95.160.08 95.350.12 96.560.12 96.600.09 96.560.11
E-s 96.900.15 96.860.14 97.190.14 97.200.16 97.190.15
F-s 97.790.02 97.790.02 99.510.00 99.520.00 99.510.00
A-t 74.441.50 78.291.64 79.052.14 79.561.83 84.761.40
B-t 86.641.87 87.921.67 87.311.93 86.961.81 88.461.68
C-t 58.890.06 59.150.05 61.010.07 61.080.06 61.410.07
D-t 90.800.40 91.630.35 91.310.34 91.530.33 92.230.30
E-t 84.660.25 84.950.27 90.590.14 90.590.13 90.770.15
F-t 85.290.04 86.000.04 87.930.04 88.000.04 88.440.05
Table 6: XGBoost classification AUC scores for link prediction performed using various feature combinations: G, W, H, GH, WH
std-G std-W std-Hm std-Ha std-H1 std-H2
AA-s 93.00.5 92.80.3 89.10.3 92.10.3 92.40.4 92.60.4
AS-s 91.20.3 88.10.2 69.70.3 91.90.3 92.60.4 92.80.4
CN-s 92.80.5 92.40.3 77.30.4 92.00.3 92.20.4 92.20.4
Cos-s 92.80.4 92.80.2 77.90.3 92.00.3 92.40.4 92.60.4
PA-s 63.60.6 62.00.8 55.00.4 56.01.2 62.60.8 62.10.9
JC-s 92.80.4 92.80.3 77.90.3 92.00.3 92.50.4 92.70.4
MxO-s 92.60.4 92.60.3 77.70.3 92.00.3 92.50.4 92.70.4
MnO-s 92.40.3 91.20.1 77.30.4 92.00.3 92.30.4 92.40.4
NM-s 92.80.4 92.70.3 77.90.3 92.00.3 92.50.4 92.60.4
Prn-s 90.60.4 90.10.2 77.90.3 92.00.3 92.40.4 92.60.4
AA-t 87.90.3 87.80.3 83.50.2 88.00.2 86.70.3 87.00.3
AS-t 87.50.2 84.80.3 67.10.3 88.00.2 87.10.3 87.50.3
CN-t 87.70.3 87.40.3 72.20.3 87.90.2 86.50.3 86.50.3
Cos-t 88.50.2 88.50.2 73.30.3 88.00.2 86.80.3 87.10.3
PA-t 53.80.3 53.80.3 51.20.4 50.30.4 52.50.4 52.30.4
JC-t 88.40.2 88.40.2 73.20.3 88.00.2 86.90.3 87.20.3
MxO-t 87.90.2 88.00.2 72.60.3 88.00.2 86.90.3 87.20.3
MnO-t 88.20.2 87.10.2 72.20.3 88.00.2 86.70.3 86.80.3
NM-t 88.30.2 88.20.2 73.20.3 88.00.2 86.80.3 87.10.3
Prn-t 85.80.2 85.00.2 73.30.3 88.00.2 86.80.3 87.10.3
Table 7: AUC scores (%) for structural (-s) and temporal (-t) link prediction using standalone features for contact-high-school (i.e., dataset B). Row ids AA–Prn represent base predictors.
mic-G mic-W mic-H mic-GH mic-WH
AA-s 93.00.7 92.80.8 93.30.6 93.40.5 93.40.6
AS-s 91.50.8 88.30.6 93.30.4 93.50.4 93.40.5
CN-s 93.00.7 92.60.9 92.90.3 93.20.4 93.20.4
Cos-s 92.90.8 93.00.6 93.10.3 93.20.4 93.50.5
PA-s 62.30.9 60.91.6 62.31.5 62.61.2 63.71.5
JC-s 92.80.5 92.80.4 93.10.3 93.30.2 93.30.3
MxO-s 92.60.4 92.50.4 93.20.4 93.30.4 93.30.3
MnO-s 92.60.9 91.50.6 93.00.2 93.30.7 93.10.3
NM-s 92.80.5 92.50.3 93.20.3 93.30.3 93.40.4
Prn-s 90.90.6 90.80.8 93.20.3 93.20.3 93.30.4
AA-t 86.32.4 86.72.4 87.32.0 87.41.8 87.92.2
AS-t 85.91.3 84.01.4 87.51.9 86.91.8 87.21.9
CN-t 87.32.0 86.81.8 86.41.9 86.82.1 87.42.1
Cos-t 87.51.6 88.12.0 87.41.8 87.31.9 87.52.1
PA-t 53.62.1 54.02.3 52.43.6 55.02.7 57.23.2
JC-t 87.51.9 88.41.9 87.51.8 87.41.7 88.01.8
MxO-t 86.91.9 87.71.3 87.41.5 87.21.6 87.51.3
MnO-t 86.91.2 86.61.5 86.62.2 86.51.9 87.71.8
NM-t 86.62.0 87.61.4 87.31.8 87.21.7 87.91.8
Prn-t 84.02.6 83.72.2 87.32.0 87.12.2 87.52.1
Table 8: AUC scores (%) for structural (-s) and temporal (-t) link prediction using micro-feature-combination via XGBoost for contact-high-school (i.e., dataset B). Row ids AA–Prn represent base predictors.

7.4 Standalone Feature Performances

For link prediction experiments in the standalone mode (Section 6.4), we show results only for a single dataset: contact-high-school (dataset B) in Table 7.

Although we did not expect to do better than the baselines in the standalone mode, since individual hypergraph scores might not be powerful link predictors, yet we observe decent performances in the last four columns (the only ones that correspond to hypergraph-based scores). Going by a base predictor individually (row-wise), graph-versions (std-G) of Adamic Adar (AA) for structural- and Cosine Similarity (Cos) for the temporal-mode perform best.

We consolidate these results for all datasets by finding the mean (over all base predictors) “rank” among all standalone modes (std-G, std-W, std-Hm, std-Ha, std-H1, std-H2) in Table 9. This is how it has to be interpreted: for example, for dataset A, in the structural mode rank of std-G being means out of the six standalone modes, std-G stands at a mean position of (with variance 0.6), when evaluated across all base predictors.

std-G std-W std-Hm std-Ha std-H1 std-H2
A-s 1.30.6 3.81.0 3.01.5 3.21.3 5.41.3 4.31.0
B-s 1.81.2 3.21.3 6.00.0 4.50.8 3.20.9 2.21.0
C-s 3.21.1 4.11.3 1.41.2 5.80.6 3.71.0 2.80.9
D-s 4.01.1 4.31.6 3.40.8 5.70.6 2.20.6 1.40.7
E-s 4.11.3 3.80.9 3.01.1 5.80.6 2.21.0 2.20.7
F-s 3.40.4 3.40.2 3.60.4 3.80.8 3.20.8 3.60.2
A-t 2.90.7 1.61.2 3.81.2 2.10.7 5.80.4 4.80.4
B-t 2.00.9 2.71.3 5.90.3 2.21.5 4.40.8 3.60.8
C-t 3.21.4 3.11.6 2.61.0 4.40.9 4.11.2 3.51.0
D-t 4.00.9 4.01.6 3.60.9 5.80.6 1.80.5 1.80.7
E-t 4.11.3 3.61.1 3.60.9 5.80.6 1.60.8 2.20.6
F-t 3.40.4 3.20.8 3.60.4 3.80.8 3.40.2 3.60.2
Table 9: Rank-performances w.r.t. AUC scores from Table 7 across all datasets. Row ids A–F represent dataset ids (ref. Section 6.1), where -s and -t refer to structural and temporal respectively.

8 Conclusion and Future Work

Structural (topological) node similarity scores have a long history in similarity computation, and have been equally successful as well. Also, hypergraph networks are very frequently used in works involving similarity computation, albeit not being exploited for the task per se. We set out to use the underlying hypergraph structure of networks to generate new features for similarity computation. Apart from establishing a strong theoretical foundation by devising functional templates that could help standard similarity computation scores getting translated from graphs to hypergraphs, we are also able to elucidate hypergraphs’ contribution in predicting links. We perform a number of experiments to show the importance of using hypergraph-based topological features for similarity computation, including showing a mutual-information based perspective. A few take-away messages are:

  1. Higher-order structure does have richer information than graphs.

  2. When available, using the underlying hypergraph structure would term fruitful in link prediction.

  3. Various matrix norms combine hyperedge information in different ways; the best bet is to use multiple norms and choose the best.

  4. Unless the similarity computation model overfits, all hypergraph features should be used, if possible.

As a next step, we would like to use the functional-formulation for global, random-walk based measures.

References

  • [1] L. A. Adamic and E. Adar (2003) Friends and neighbors on the web. Social networks 25 (3), pp. 211–230. External Links: Review DATA: 1. V: stanford.edu homepages E: Web hyperlinks 2. V: mit.edu homepages E: Web hyperlinks Cited by: §3, §5.
  • [2] S. Agarwal, K. Branson, and S. Belongie (2006) Higher order learning with graphs. In

    Proceedings of the 23rd International Conference on Machine Learning

    ,
    ICML ’06, New York, NY, USA, pp. 17–24. External Links: Document, ISBN 1-59593-383-2, Review 1. Star expansion of hypergraphs. 2. Hypergraph expanded into bipartite graph. 3. Assume that hyperedges are decomposable., Link Cited by: §1, §2.
  • [3] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki (2006) Link prediction using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and security, Cited by: §1.
  • [4] J. I. F. Bass, A. Diallo, J. Nelson, J. M. Soto, C. L. Myers, and A. J. Walhout (2013) Using networks to measure similarity between genes: association index selection. Nature methods 10 (12), pp. 1169. Cited by: §1.
  • [5] A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, and J. Kleinberg (2018) Simplicial closure and higher-order link prediction. arXiv preprint arXiv:1802.06916. Cited by: §5, §6.1.
  • [6] C. Berge (1984) Hypergraphs: combinatorics of finite sets. Vol. 45, Elsevier. Cited by: §1.
  • [7] A. Bretto (2013) Hypergraph theory. An introduction. Mathematical Engineering. Cham: Springer. Cited by: §1.
  • [8] T. Chen and C. Guestrin (2016) Xgboost: a scalable tree boosting system. pp. 785–794. Cited by: §6.4.
  • [9] J. Davis and M. Goadrich (2006) The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pp. 233–240. Cited by: §6.4.
  • [10] F. Fouss, A. Pirotte, J. Renders, and M. Saerens (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on knowledge and data engineering 19 (3), pp. 355–369. Cited by: §1.
  • [11] L. Gou, X. Zhang, H. Chen, J. Kim, and C. L. Giles (2010) Social network document ranking. pp. 313–322. Cited by: §1.
  • [12] R. Guns (2014) Link prediction. In Measuring scholarly impact, pp. 35–55. Cited by: §3.1, §4.2.
  • [13] L. Katz (1953-03-01) A new status index derived from sociometric analysis. Psychometrika 18 (1), pp. 39–43. External Links: Document, ISSN 1860-0980, Review dslkfj, Link Cited by: §3, §5.
  • [14] B. Klimt and Y. Yang (2004) Introducing the enron corpus.. Cited by: item A.
  • [15] E. Kwon, J. Kim, N. Heo, and S. Kang (2012) Personalized recommendation system using level of cosine similarity of emotion word from social network. Journal of Information Technology and Architecture 9 (3), pp. 333–344. Cited by: §1.
  • [16] D. Li, Z. Xu, S. Li, and X. Sun (2013) Link prediction in social networks based on hypergraph. In Proceedings of the 22nd International Conference on World Wide Web, pp. 41–42. Cited by: §3, §5.
  • [17] D. Liben-Nowell and J. Kleinberg (2003) The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, New York, NY, USA, pp. 556–559. External Links: Document, ISBN 1-58113-723-0, Link Cited by: §1, §3, §4.2, §5, item 1, footnote 2.
  • [18] Z. Lu, B. Savas, W. Tang, and I. S. Dhillon (2010) Supervised link prediction using multiple sources. In 2010 IEEE international conference on data mining, pp. 923–928. Cited by: §5.
  • [19] V. Martínez, F. Berzal, and J. Cubero (2016-12) A survey of link prediction in complex networks. ACM Comput. Surv. 49 (4), pp. 69:1–69:33. External Links: Document, ISSN 0360-0300, Link Cited by: §3, §5.
  • [20] R. Mastrandrea, J. Fournet, and A. Barrat (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PloS one 10 (9), pp. e0136497. Cited by: item B.
  • [21] M. E. J. Newman (2004) Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences 101 (suppl 1), pp. 5200–5205. External Links: Document, ISSN 0027-8424, Link, https://www.pnas.org/content/101/suppl_1/5200.full.pdf Cited by: §3.
  • [22] M. E. Newman (2001) Clustering and preferential attachment in growing networks. Physical review E 64 (2), pp. 025102. Cited by: §1, §5.
  • [23] C. E. Shannon (1948) A mathematical theory of communication. Bell system technical journal 27 (3), pp. 379–423. Cited by: §6.3.
  • [24] Y. Sun, R. Barber, M. Gupta, C. C. Aggarwal, and J. Han (2011) Co-author relationship prediction in heterogeneous bibliographic networks. pp. 121–128. Cited by: §1.
  • [25] F. Tan, Y. Xia, and B. Zhu (2014) Link prediction in complex networks: a mutual information perspective. PloS one 9 (9), pp. e107056. Cited by: §6.3.
  • [26] P. Wang, B. Xu, Y. Wu, and X. Zhou (2015-01-01) Link prediction in social networks: the state-of-the-art. Science China Information Sciences 58 (1), pp. 1–38. External Links: Document, ISSN 1869-1919, Link Cited by: §5.
  • [27] Y. Xu, D. Rockmore, and A. M. Kleinbaum (2013) Hyperlink prediction in hypernetworks using latent social features. In International Conference on Discovery Science, pp. 324–339. Cited by: §5.
  • [28] D. Zhou, J. Huang, and B. Schölkopf (2006) Learning with hypergraphs: clustering, classification, and embedding. Cambridge, MA, USA, pp. 1601–1608. External Links:

    Review 1. Generalizes spectral clustering techniques to hypergraphs.

    , Link
    Cited by: §1.