A Hidden Challenge of Link Prediction: Which Pairs to Check?

02/15/2021
by   Caleb Belth, et al.
0

The traditional setup of link prediction in networks assumes that a test set of node pairs, which is usually balanced, is available over which to predict the presence of links. However, in practice, there is no test set: the ground-truth is not known, so the number of possible pairs to predict over is quadratic in the number of nodes in the graph. Moreover, because graphs are sparse, most of these possible pairs will not be links. Thus, link prediction methods, which often rely on proximity-preserving embeddings or heuristic notions of node similarity, face a vast search space, with many pairs that are in close proximity, but that should not be linked. To mitigate this issue, we introduce LinkWaldo, a framework for choosing from this quadratic, massively-skewed search space of node pairs, a concise set of candidate pairs that, in addition to being in close proximity, also structurally resemble the observed edges. This allows it to ignore some high-proximity but low-resemblance pairs, and also identify high-resemblance, lower-proximity pairs. Our framework is built on a model that theoretically combines Stochastic Block Models (SBMs) with node proximity models. The block structure of the SBM maps out where in the search space new links are expected to fall, and the proximity identifies the most plausible links within these blocks, using locality sensitive hashing to avoid expensive exhaustive search. LinkWaldo can use any node representation learning or heuristic definition of proximity, and can generate candidate pairs for any link prediction method, allowing the representation power of current and future methods to be realized for link prediction in practice. We evaluate LinkWaldo on 13 networks across multiple domains, and show that on average it returns candidate sets containing 7-33 more missing and future links than both embedding-based and heuristic baselines' sets.

READ FULL TEXT
research
04/23/2019

Link Prediction in Multiplex Networks based on Interlayer Similarity

Some networked systems can be better modelled by multilayer structure wh...
research
08/03/2022

Link Prediction on Heterophilic Graphs via Disentangled Representation Learning

Link prediction is an important task that has wide applications in vario...
research
02/04/2020

ALPINE: Active Link Prediction using Network Embedding

Many real-world problems can be formalized as predicting links in a part...
research
10/25/2022

Line Graph Contrastive Learning for Link Prediction

Link prediction task aims to predict the connection of two nodes in the ...
research
02/02/2023

Causal Lifting and Link Prediction

Current state-of-the-art causal models for link prediction assume an und...
research
06/10/2021

Learning Based Proximity Matrix Factorization for Node Embedding

Node embedding learns a low-dimensional representation for each node in ...
research
05/25/2021

Graph Based Link Prediction between Human Phenotypes and Genes

Background: The learning of genotype-phenotype associations and history ...

Please sign up or login with your details

Forgot password? Click here to reset