1 Introduction
Given a graph representing known relationships between a set of nodes, the goal of link prediction is to learn from the graph and infer novel or previously unknown relationships (LibenNowell:2003:LPP:956863.956972). For instance, in a social network we may use link prediction to power a friendship recommendation system (Aiello:2012:FPH:2180861.2180866), or in the case of biological network data we might use link prediction to infer possible relationships between drugs, proteins, and diseases (zitnik2017predicting). However, despite its popularity, previous work on link prediction generally focuses only on one particular problem setting: it generally assumes that link prediction is to be performed on a single large graph and that this graph is relatively complete, i.e., that at least 50% of the true edges are observed during training (e.g., see grover2016node2vec; kipf2016variational; LibenNowell:2003:LPP:956863.956972; lu2011link).
In this work, we consider the more challenging setting of few shot link prediction
, where the goal is to perform link prediction on multiple graphs that contain only a small fraction of their true, underlying edges. This task is inspired by applications where we have access to multiple graphs from a single domain but where each of these individual graphs contains only a small fraction of the true, underlying edges. For example, in the biological setting, highthroughput interactomics offers the possibility to estimate thousands of biological interaction networks from different tissues, cell types, and organisms
(barrios2005high); however, these estimated relationships can be noisy and sparse, and we need learning algorithms that can leverage information across these multiple graphs in order to overcome this sparsity. Similarly, in the ecommerce and social network settings, link prediction can often have a large impact in cases where we must quickly make predictions on sparselyestimated graphs, such as when a service has been recently deployed to a new locale. In other words, link prediction for a new sparse graph can benefit from transferring knowledge from other, possibly more dense, graphs assuming there is exploitable shared structure.We term this problem of link prediction from sparselyestimated multigraph data as few shot link prediction analogous to the popular few shot classification setting (miller2000learning; lake2011one; koch2015siamese). The goal of few shot link prediction is to observe many examples of graphs from a particular domain and leverage this experience to enable fast adaptation and higher accuracy when predicting edges on a new, sparselyestimated graph from the same domain—a task that can can also be viewed as a form of meta learning, or learning to learn (bengio1990learning; bengio1992optimization; thrun2012learning; schmidhuber1987evolutionary) in the context of link prediction. This few shot link prediction setting is particularly challenging as current link prediction methods are generally illequipped to transfer knowledge between graphs in a multigraph setting and are also unable to effectively learn from very sparse data.
Present work. We introduce a new framework called MetaGraph for few shot link prediction and also introduce a series of benchmarks for this task. We adapt the classical gradientbased metalearning formulation for few shot classification (miller2000learning; lake2011one; koch2015siamese) to the graph domain. Specifically, we consider a distribution over graphs as the distribution over tasks from which a global set of parameters are learnt, and we deploy this strategy to train graph neural networks (GNNs) that are capable of fewshot link prediction. To further bootstrap fast adaptation to new graphs we also introduce a graph signature function, which learns how to map the structure of an input graph to an effective initialization point for a GNN link prediction model. We experimentally validate our approach on three link prediction benchmarks. We find that our MetaGraph approach not only achieves fast adaptation but also converges to a better overall solution in many experimental settings, with an average improvement of in AUC at convergence over nonmeta learning baselines.
2 Preliminaries and Problem Definition
The basic setup for few shot link prediction is as follows: We assume that we have a distribution over graphs, from which we can sample training graphs , where each is defined by a set of nodes , edges , and matrix of realvalued node attributes . When convenient, we will also equivalently represent a graph as , where is an adjacency matrix representation of the edges in . We assume that each of these sampled graphs, , is a simple graph (i.e., contain a single type of relation and no self loops) and that every node
in the graph is associated with a real valued attribute vector
from a common vector space. We further assume that for each graph we have access to only a sparse subset of the true edges (with ) during training. In terms of distributional assumptions we assume that this is defined over a set of related graphs (e.g., graphs drawn from a common domain or application setting).Our goal is to learn a global or meta link prediction model from a set of sampled training graphs , such that we can use this meta model to quickly learn an effective link prediction model on a newly sampled graph . More specifically, we wish to optimize a global set of parameters , as well as a graph signature function , which can be used together to generate an effective parameter initialization, , for a local link prediction model on graph .
Relationship to standard link prediction. Few shot link prediction differs from standard link prediction in three important ways:

[leftmargin=*, itemsep=2pt, topsep=0pt, parsep=0pt]

Rather than learning from a single graph , we are learning from multiple graphs sampled from a common distribution or domain.

We presume access to only a very sparse sample of true edges. Concretely, we focus on settings where at most 30% of the edges in are observed during training, i.e., where .^{1}^{1}1By “true edges” we mean the full set of ground truth edges available in a particular dataset.

We distinguish between the global parameters, which are used to encode knowledge about the underlying distribution of graphs, and the local parameters , which are optimized to perform link prediction on a specific graph . This distinction allows us to consider leveraging information from multiple graphs, while still allowing for individuallytuned link prediction models on each specific graph.
Relationship to traditional meta learning. Traditional meta learning for fewshot classification generally assumes a distribution over classification tasks, with the goal of learning global parameters that can facilitate fast adaptation to a newly sampled task with few examples. We instead consider a distribution over graphs with the goal of performing link prediction on a newly sampled graph. An important complication of this graph setting is that the individual predictions for each graph (i.e., the training edges) are not i.i.d.. Furthermore, for few shot link prediction we require training samples as a sparse subset of true edges that represents a small percentage of all edges in a graph. Note that for very small percentages of training edges we effectively break all graph structure and recover the supervised setting for few shot classification.
3 Proposed Approach
We now outline our proposed approach, MetaGraph, to the few shot link prediction problem. We first describe how we define the local link prediction models, which are used to perform link prediction on each specific graph . Next, we discuss our novel gradientbased meta learning approach to define a global model that can learn from multiple graphs to generate effective parameter initializations for the local models. The key idea behind MetaGraph is that we use gradientbased meta learning to optimize a shared parameter initialization for the local models, while also learning a parametric encoding of each graph that can be used to modulate this parameter initialization in a graphspecific way (Figure 1).
3.1 Local Link Prediction Model
In principle, our framework can be combined with a wide variety of GNNbased link prediction approaches, but here we focus on variational graph autoencoders (VGAEs)
(kipf2016variational) as our base link prediction framework. Formally, given a graph , the VGAE learns an inference model, , that defines a distribution over node embeddings , where each row of is a node embedding that can be used to score the likelihood of an edge existing between pairs of nodes. The parameters of the inference model are shared across all the nodes in , to define the approximate posterior, where the parameters of the normal distribution are learned via GNNs:
(1) 
The generative component of the VGAE is then defined as
(2) 
i.e., the likelihood of an edge existing between two nodes, and , is proportional to the dot product of their node embeddings. Given the above components, the inference GNNs can be trained to minimize the variational lower bound on the training data:
(3) 
where a Gaussian prior is used for .
We build upon VGAEs due to their strong performance on standard link prediction benchmarks (kipf2016variational), as well as the fact that they have a welldefined probabilistic interpretation that generalizes many embeddingbased approaches to link prediction (e.g., node2vec (grover2016node2vec)). We describe the specific GNN implementations we deploy for the inference model in Section 3.3.
3.2 Overview of MetaGraph
The key idea behind MetaGraph is that we use gradientbased meta learning to optimize a shared parameter initialization for the inference models of a VGAE, while also learning a parametric encoding that modulates this parameter initialization in a graphspecific way. Specifically, given a sampled training graph , we initialize the inference model for a VGAE link prediction model using a combination of two learned components:

[leftmargin=*, itemsep=2pt, topsep=0pt, parsep=0pt]

A global initialization, , that is used to initialize all the parameters of the GNNs in the inference model. The global parameters are optimized via secondorder gradient descent to provide an effective initialization point for any graph sampled from the distribution .

A graph signature that is used to modulate the parameters of inference model based on the history of observed training graphs. In particular, we assume that the inference model for each graph can be conditioned on the graph signature. That is, we augment the inference model to , where we also include the graph signature as a conditioning input. We use a klayer graph convolutional network (GCN) (kipf2016semi), with sum pooling to compute the signature:
(4) where GCN denotes a klayer GCN (as defined in (kipf2016semi)), MLP denotes a denselyconnected neural network, and we are summing over the node embeddings output from the GCN. As with the global parameters , the graph signature model is optimized via secondorder gradient descent.
The overall MetaGraph architecture is detailed in Figure 1 and the core learning algorithm is summarized in the algorithm block below.
The basic idea behind the algorithm is that we (i) sample a batch of training graphs, (ii) initialize VGAE link prediction models for these training graphs using our global parameters and signature function, (iii) run steps of gradient descent to optimize each of these VGAE models, and (iv) use second order gradient descent to update the global parameters and signature function based on a heldout validation set of edges. As depicted in Fig 1, this corresponds to updating the GCN based encoder for the local link prediction parameters and global parameters along with the graph signature function using second order gradients. Note that since we are running steps of gradient descent within the inner loop of Algorithm 1, we are also “meta” optimizing for fast adaptation, as and are being trained via secondorder gradient descent to optimize the local model performance after gradient updates, where generally .
3.3 Variants of MetaGraph
We consider several concrete instantiations of the MetaGraph framework, which differ in terms of how the output of the graph signature function is used to modulate the parameters of the VGAE inference models. For all the MetaGraph variants, we build upon the standard GCN propagation rule (kipf2016semi) to construct the VGAE inference models. In particular, we assume that all the inference GNNs (Equation 1) are defined by stacking neural message passing layers of the form:
(5) 
where denotes the embedding of node at layer of the model, denotes the nodes in the graph neighborhood of , and is a trainable weight matrix for layer . The key difference between Equation 5 and the standard GCN propagation rule is that we add the modulation function , which is used to modulate the message passing based on the graph signature .
We describe different variations of this modulation below. In all cases, the intuition behind this modulation is that we want to compute a structural signature from the input graphs that can be used to condition the initialization of the local link prediction models. Intuitively, we expect this graph signature to encode structural properties of sampled graphs in order to modulate the parameters of the local VGAE link prediction models and adapt it to the current graph.
GSModulation. Inspired by brockschmidt2019gnn, we experiment with basic featurewise linear modulation (strub2018visual) to define the modulation function :
(6) 
Here, we restrict the modulation terms and output by the signature function to be in by applying a nonlinearity after Equation 4.
GSGating. Featurewise linear modulation of the GCN parameters (Equation 3.3) is an intuitive and simple choice that provides flexible modulation while still being relatively constrained. However, one drawback of the basic linear modulation is that it is “always on”, and there may be instances where the modulation could actually be counterproductive to learning. To allow the model to adaptively learn when to apply modulation, we extend the featurewise linear modulation using a sigmoid gating term, (with entries), that gates in the influence of and :
GSWeights. In the final variant of MetaGraph, we extend the gating and modulation idea by separately aggregating graph neighborhood information with and without modulation and then merging these two signals via a convex combination:
where we use the basic linear modulation (Equation 3.3) to define .
3.4 MAML for link prediction as a special case
Note that a simplification of MetaGraph, where the graph signature function is removed, can be viewed as an adaptation of model agnostic meta learning (MAML) (finn2017model) to the few shot link prediction setting. As discussed in Section 2, there are important differences in the setup for few shot link prediction, compared to traditional few shot classification. Nonetheless, the core idea of leveraging an inner and outer loop of training in Algorithm 1—as well as using second order gradients to optimize the global parameters—can be viewed as an adaptation of MAML to the graph setting, and we provide comparisons to this simplified MAML approach in the experiments below. We formalize the key differences by depicting the graphical model of MAML as first depicted in (grant2018recasting) and contrasting it with the graphical model for MetaGraph, in Figure 1. MAML when reinterpreted for a distribution over graphs, maximizes the likelihood over all edges in the distribution. On the other hand, MetaGraph when recast in a hierarchical Bayesian framework adds a graph signature function that influences to produce the modulated parameters from sampled edges. This explicit influence of is captured by the term in Equation 7 below:
(7) 
For computational tractability we take the likelihood of the modulated parameters as a point estimate —i.e., .
4 Experiments
We design three novel benchmarks for the fewshot link prediction task. All of these benchmarks contain a set of graphs drawn from a common domain. In all settings, we use 80% of these graphs for training and 10% as validation graphs, where these training and validation graphs are used to optimize the global model parameters (for MetaGraph) or pretrain weights (for various baseline approaches). We then provide the remaining 10% of the graphs as test graphs, and our goal is to finetune or train a model on these test graphs to achieve high link prediction accuracy. Note that in this few shot link prediction setting, there are train/val/test splits at both the level of graphs and edges: for every individual graph, we are optimizing a model using the training edges to predict the likelihood of the test edges, but we are also training on multiple graphs with the goal of facilitating fast adaptation to new graphs via the global model parameters.
Our goal is to use our benchmarks to investigate four key empirical questions:

[leftmargin=18pt, topsep=0pt, parsep=0pt, itemsep=2pt]

How does the overall performance of MetaGraph compare to various baselines, including (i) a simple adaptation of MAML (finn2017model)
(i.e., an ablation of MetaGraph where the graph signature function is removed), (ii), standard pretraining approaches where we pretrain the VGAE model on the training graphs before finetuning on the test graphs, and (iii) naive baselines that do not leverage multigraph information (i.e., a basic VGAE without pretraining, the AdamicAdar heuristic
(adamic2003friends), and DeepWalk (perozzi2014deepwalk))? 
How well does MetaGraph perform in terms of fast adaption? Is MetaGraph able to achieve strong performance after only a small number of gradient steps on the test graphs?

How necessary is the graph signature function for strong performance, and how do the different variants of the MetaGraph signature function compare across the various benchmark settings?

What is learned by the graph signature function? For example, do the learned graph signatures correlate with the structural properties of the input graphs, or are they more sensitive to node feature information?
Dataset  #Graphs  Avg. Nodes  Avg. Edges  #Node Feats 

PPI  24  2,331  64,596  50 
FirstMM DB  41  1,377  6,147  5 
EgoAMINER  72  462  2245  300 
Datasets. Two of our benchmarks are derived from standard multigraph datasets from proteinprotein interaction (PPI) networks (zitnik2017predicting) and 3D point cloud data (FirstMMDB) (neumann2013graph). These benchmarks are traditionally used for node and graph classification, respectively, but we adapt them for link prediction. We also create a novel multigraph dataset based upon the AMINER citation data (tang2008arnetminer), where each node corresponds to a paper and links represent citations. We construct individual graphs from AMINER data by sampling ego networks around nodes and create node features using embeddings of the paper abstracts (see Appendix for details). We preprocess all graphs in each domain such that each graph contains a minimum of nodes and up to a maximum of nodes. For all datasets, we perform link prediction by training on a small subset (i.e., a percentage) of the edges and then attempting to predict the unseen edges (with of the heldout edges used for validation). Key dataset statistics are summarized in Table 1.
Baseline details. Several baselines correspond to modifications or ablations of MetaGraph, including the straightforward adaptation of MAML (which we term MAML in the results), a finetune baseline where we pretrain a VGAE on the training graphs observed in a sequential order and finetune on the test graphs (termed Finetune). We also consider a VGAE trained individually on each test graph (termed No Finetune
). For MetaGraph and all of these baselines we employ Bayesian optimization with Thompson sampling
(kandasamy2018parallelised)to perform hyperparameter selection using the validation sets. We use the recommended default hyperparameters for DeepWalk and AdamicAdar baseline is hyperparameterfree.
^{2}^{2}2Code is included with our submission and will be made public after the review processPPI  FirstMM DB  EgoAMINER  

Edges  10%  20%  30%  10%  20%  30%  10%  20%  30% 
MetaGraph  0.795  0.833  0.845  0.782  0.786  0.783  0.626  0.738  0.786 
MAML  0.770  0.815  0.828  0.776  0.782  0.793  0.561  0.662  0.667 
Random  0.578  0.651  0.697  0.742  0.732  0.720  0.500  0.500  0.500 
No Finetune  0.738  0.786  0.801  0.740  0.710  0.734  0.548  0.621  0.673 
Finetune  0.752  0.801  0.821  0.752  0.735  0.723  0.623  0.691  0.723 
Adamic  0.540  0.623  0.697  0.504  0.519  0.544  0.515  0.549  0.597 
Deepwalk  0.664  0.673  0.694  0.487  0.473  0.510  0.602  0.638  0.672 
4.1 Results
Q1: Overall Performance. Table 2 shows the link prediction AUC for MetaGraph and the baseline models when trained to convergence using 10%, 20% or 30% of the graph edges. In this setting, we adapt the link prediction models on the test graphs until learning converges, as determined by performance on the validation set of edges, and we report the average link prediction AUC over the test edges of the test graphs. Overall, we find that MetaGraph achieves the highest average AUC in all but one setting, with an average relative improvement of in AUC compared to the MAML approach and an improvement of compared to the Finetune baseline. Notably, MetaGraph is able to maintain especially strong performance when using only of the graph edges for training, highlighting how our framework can learn from very sparse samples of edges. Interestingly, in the EgoAMINER dataset, unlike PPI and FIRSTMM DB, we observe the relative difference in performance between MetaGraph and MAML to increase with density of the training set. We hypothesize that this is due to the fickle nature of optimization with higher order gradients in MAML (antoniou2018train) which is somewhat alleviated in GSgating due to the gating mechanism. With respect to computational complexity we observe only a slight overhead when comparing MetaGraph to MAML, which can be reconciled by realizing that the graph signature function is not updated in the inner loop update but only in outer loop. In the Appendix, we provide additional results when using larger sets of training edges, and, as expected, we find that the relative gains of MetaGraph decrease as more and more training edges are available.
PPI  FirstMM DB  EgoAMINER  

Edges  10%  20%  30%  10%  20%  30%  10%  20%  30% 
MetaGraph  0.795  0.824  0.847  0.773  0.767  0.737  0.620  0.585  0.732 
MAML  0.728  0.809  0.804  0.763  0.750  0.750  0.500  0.504  0.500 
No Finetune  0.600  0.697  0.717  0.708  0.680  0.709  0.500  0.500  0.500 
Finetune  0.582  0.727  0.774  0.705  0.695  0.704  0.608  0.675  0.713 
Q2: Fast Adaptation. Table 3 highlights the average AUCs achieved by MetaGraph and the baselines after performing only 5 gradient updates on the batch of training edges. Note that in this setting we only compare to the MAML, Finetune, and No Finetune baselines, as fast adaption in this setting is not well defined for the DeepWalk and AdamicAdar baselines. In terms of fast adaptation, we again find that MetaGraph is able to outperform all the baselines in all but one setting, with an average relative improvement of compared to MAML and compared to the Finetune baseline—highlighting that MetaGraph can not only learn from sparse samples of edges but is also able to quickly learn on new data using only a small number of gradient steps. Also, we observe poor performance for MAML in the EgoAMINER dataset dataset which we hypothesize is due to extremely low learning rates —i.e. needed for any learning, the addition of a graph signature alleviates this problem. Figure 2 shows the learning curves for the various models on the PPI and FirstMM DB datasets, where we can see that MetaGraph learns very quickly but can also begin to overfit after only a small number of gradient updates, making early stopping essential.
Q3: Choice of MetaGraph Architecture. We study the impact of the graph signature function and its variants GSGating and GSWeights by performing an ablation study using the FirstMM DB dataset. Figure 3 shows the performance of the different model variants and baselines considered as the training progresses. In addition to models that utilize different signature functions we report a random baseline where parameters are initialized but never updated allowing us to assess the inherent power of the VGAE model for fewshot link prediction. To better understand the utility of using a GCN based inference network we also report a VGAE model that uses a simple MLP on the node features and is trained analogously to MetaGraph as a baseline. As shown in Figure 3 many versions of the signature function start at a better initialization point or quickly achieve higher AUC scores in comparison to MAML and the other baselines, but simple modulation and GSGating are superior to GSWeights after a few gradient steps.
Q4: What is learned by the graph signature? To gain further insight into what knowledge is transferable among graphs we use the FirstMM DB and EgoAMINER datasets to probe and compare the output of the signature function with various graph heuristics. In particular, we treat the output of
as a vector and compute the cosine similarity between all pairs of graph in the training set (i.e., we compute the pairwise cosine similarites between graph signatures,
). We similarly compute three pairwise graph statistics—namely, the cosine similarity between average node features in the graphs, the difference in number of nodes, and the difference in number of edges—and we compute the Pearson correlation between the pairwise graph signature similarities and these other pairwise statistics. As shown in Table 4 we find strong positive correlation in terms of Pearson correlation coefficient between node features and the output of the signature function for both datasets, indicating that the graph signature function is highly sensitive to feature information. This observation is not entirely surprising given that we use such sparse samples of edges—meaning that many structural graph properties are likely lost and making the metalearning heavily reliant on node feature information. We also observe moderate negative correlation with respect to the average difference in nodes and edges between pairs of graphs for FirstMM DB dataset. For EgoAMINER we observe small positive correlation for difference in nodes and edges.

FirstMM DB  EgoAMINER  

% Edges  10%  20%  30%  10%  20%  30% 
Node Feats  0.928  0.950  0.761  0.473  0.385  0.448 
Diff Num. Nodes  0.093  0.196  0.286  0.095  0.086  0.085 
Diff Num. Edges  0.093  0.195  0.281  0.093  0.072  0.075 
5 Related Work
We now briefly highlight related work on link prediction, metalearning, fewshot classification, and fewshot learning in knowledge graphs. Link prediction considers the problem of predicting missing edges between two nodes in a graph that are likely to have an edge.
(LibenNowell:2003:LPP:956863.956972). Common successful applications of link prediction include friend and content recommendations (Aiello:2012:FPH:2180861.2180866), shopping and movie recommendation (Huang:2005:LPA:1065385.1065415), knowledge graph completion (nickel2015review) and even important social causes such as identifying criminals based on past activities (Hasan06linkprediction). Historically, link prediction methods have utilized topological graph features (e.g., neighborhood overlap), yielding strong baselines such as the Adamic/Adar measure (adamic2003friends). Other approaches include matrix factorization methods (Menon:2011:LPV:2034117.2034146)and more recently deep learning and graph neural network based approaches
(grover2016node2vec; wang2015link; zhang2018link). A commonality among all the above approaches is that the link prediction problem is defined over a single dense graph where the objective is to predict unknown/future links within the same graph. Unlike these previous approaches, our approach considers link prediction tasks over multiple sparse graphs that are drawn from distribution over graphs akin to real world scenario such as proteinprotein interaction graphs, 3D point cloud data and citation graphs in different communities.In metalearning or learning to learn (bengio1990learning; bengio1992optimization; thrun2012learning; schmidhuber1987evolutionary)
, the objective is to learn from prior experiences to form inductive biases for fast adaptation to unseen tasks. Metalearning has been particularly effective in fewshot learning tasks with a few notable approaches broadly classified into metric based approaches
(vinyals2016matching; snell2017prototypical; koch2015siamese), augmented memory (santoro2016meta; kaiser2017learning; mishra2017simple) and optimization based approaches (finn2017model; lee2018gradient). Recently, there are several works that lie at the intersection of metalearning for fewshot classification and graph based learning. In Latent Embedding Optimization, rusu2018meta learn a graph between tasks in embedding space while liu2019propagate introduce a message propagation rule between prototypes of classes. However, both these methods are restricted to the image domain and do not consider metalearning over a distribution of graphs as done here.Another related line of work considers the task of fewshot relation prediction in knowledge graphs. xiong2018one developed the first method for this task, which leverages a learned matching metric using both a learned embedding and onehop graph structures. More recently chen2019meta introduce Meta Relational Learning framework (MetaR) that seeks to transfer relationspecific meta information to new relation types in the knowledge graph. A key distinction between fewshot relation setting and the one which we consider in this work is that we assume a distribution over graphs, while in the knowledge graph setting there is only a single graph and the challenge is generalizing to new types of relations within this graph.
6 Discussion and Conclusion
We introduce the problem of fewshot link prediction—where the goal is to learn from multiple graph datasets to perform link prediction using small samples of graph data—and we develop the MetaGraph framework to address this task. Our framework adapts gradientbased meta learning to optimize a shared parameter initialization for local link prediction models, while also learning a parametric encoding, or signature, of each graph, which can be used to modulate this parameter initialization in a graphspecific way. Empirically, we observed substantial gains using MetaGraph compared to strong baselines on three distinct fewshot link prediction benchmarks. In terms of limitations and directions for future work, one key limitation is that our graph signature function is limited to modulating the local link prediction model through an encoding of the current graph, which does not explicitly capture the pairwise similarity between graphs in the dataset. Extending MetaGraph by learning a similarity metric or kernel between graphs—which could then be used to condition metalearning—is a natural direction for future work. Another interesting direction for future work is extending the MetaGraph approach to multirelational data, and exploiting similarities between relation types through a suitable graph signature function.
Acknowledgements
The authors would like to thank Thang Bui, Maxime Wabartha, Nadeem Ward, Sebastien Lachapelle, and Zhaocheng Zhu for helpful feedback on earlier drafts of this work. In addition, the authors would like to thank the Uber AI team including other interns that helped shape earlier versions of this idea. Joey Bose is supported by the IVADO PhD fellowship and this work was done as part of his internship at Uber AI.
References
7 Appendix
7.1 A: EgoAminer Dataset Construction
To construct the EgoAminer dataset we first create citation graphs from different fields of study. We then select the top graphs in terms number of nodes for further preprocessing. Specifically, we take the core of each graph ensuring that each node has a minimum of edges. We then construct ego networks by randomly sampling a node from the core graph and taking its two hop neighborhood. Finally, we remove graphs with fewer than nodes and greater than nodes which leads to a total of graphs as reported in Table 1.
7.2 B: Additional Results
We list out complete results when using larger sets of training edges for PPI, FIRSTMM DB and EgoAminer datasets. We show the results for two metrics i.e. Average AUC across all test graphs. As expected, we find that the relative gains of MetaGraph decrease as more and more training edges are available.
PPI  

Convergence  
MetaGraph  0.795  0.831  0.846  0.853  0.848  0.853  0.855 
MAML  0.745  0.820  0.840  0.852  0.854  0.856  0.863 
Random  0.578  0.651  0.697  0.729  0.756  0.778  0.795 
No Finetune  0.738  0.786  0.801  0.817  0.827  0.837  0.836 
Finetune  0.752  0.8010  0.821  0.832  0.818  0.856  0.841 
Adamic  0.540  0.623  0.697  0.756  0.796  0.827  0.849 
MAMLMLP  0.603  0.606  0.606  0.606  0.604  0.604  0.605 
Deepwalk  0.664  0.673  0.694  0.727  0.731  0.747  0.761 

PPI5 updates  

MetaGraph  0.795  0.829  0.847  0.853  0.848  0.854  0.856 
MAML  0.756  0.837  0.840  0.852  0.855  0.855  0.856 
No Finetune  0.600  0.697  0.717  0.784  0.814  0.779  0.822 
Finetune  0.582  0.727  0.774  0.702  0.804  0.718  0.766 
MAMLMLP  0.603  0.606  0.603  0.604  0.603  0.606  0.605 

FirstMM DB  

Convergence  
MetaGraph  0.782  0.786  0.783  0.781  0.760  0.746  0.739 
MAML  0.776  0.782  0.793  0.785  0.791  0.663  0.788 
Random  0.742  0.732  0.720  0.714  0.705  0.698  0.695 
No Finetune  0.740  0.710  0.734  0.722  0.712  0.710  0.698 
Finetune  0.752  0.735  0.723  0.734  0.749  0.700  0.695 
Adamic  0.504  0.519  0.544  0.573  0.604  0.643  0.678 
Deepwalk  0.487  0.473  0.510  0.608  0.722  0.832  0.911 
FirstMM DB  

5 updates  
MetaGraph  0.773  0.767  0.743  0.759  0.742  0.732  0.688 
MAML  0.763  0.750  0.624  0.776  0.759  0.663  0.738 
No Finetune  0.708  0.680  0.709  0.701  0.685  0.683  0.653 
Finetune  0.705  0.695  0.704  0.704  0.696  0.658  0.670 
EgoAminer  

Convergence  
MetaGraph  0.626  0.738  0.786  0.791  0.792  0.817  0.786 
MAML  0.561  0.662  0.667  0.682  0.720  0.741  0.768 
Random  0.500  0.500  0.500  0.500  0.500  0.500  0.500 
No Finetune  0.548  0.621  0.673  0.702  0.652  0.7458  0.769 
Finetune  0.623  0.691  0.723  0.764  0.767  0.792  0.781 
Adamic  0.515  0.549  0.597  0.655  0.693  0.744  0.772 
Deepwalk  0.602  0.638  0.672  0.686  0.689  0.711  0.731 
EgoAminer  

5 updates  
MetaGraph  0.620  0.5850  0.732  0.500  0.790  0.733  0.500 
MAML  0.500  0.504  0.500  0.500  0.519  0.500  0.500 
No Finetune  0.500  0.500  0.500  0.500  0.500  0.500  0.500 
Finetune  0.608  0.675  0.713  0.755  0.744  0.706  0.671 
Comments
There are no comments yet.