Few-shot link prediction via graph neural networks for Covid-19 drug-repurposing

by   Vassilis N. Ioannidis, et al.

Predicting interactions among heterogenous graph structured data has numerous applications such as knowledge graph completion, recommendation systems and drug discovery. Often times, the links to be predicted belong to rare types such as the case in repurposing drugs for novel diseases. This motivates the task of few-shot link prediction. Typically, GCNs are ill-equipped in learning such rare link types since the relation embedding is not learned in an inductive fashion. This paper proposes an inductive RGCN for learning informative relation embeddings even in the few-shot learning regime. The proposed inductive model significantly outperforms the RGCN and state-of-the-art KGE models in few-shot learning tasks. Furthermore, we apply our method on the drug-repurposing knowledge graph (DRKG) for discovering drugs for Covid-19. We pose the drug discovery task as link prediction and learn embeddings for the biological entities that partake in the DRKG. Our initial results corroborate that several drugs used in clinical trials were identified as possible drug candidates. The method in this paper are implemented using the efficient deep graph learning (DGL)



There are no comments yet.


page 7


Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction

Many practical graph problems, such as knowledge graph construction and ...

Dr-COVID: Graph Neural Networks for SARS-CoV-2 Drug Repurposing

The 2019 novel coronavirus (SARS-CoV-2) pandemic has resulted in more th...

Drug Similarity and Link Prediction Using Graph Embeddings on Medical Knowledge Graphs

The paper utilizes the graph embeddings generated for entities of a larg...

GENN: Predicting Correlated Drug-drug Interactions with Graph Energy Neural Networks

Gaining more comprehensive knowledge about drug-drug interactions (DDIs)...

Wasserstein Adversarial Autoencoders for Knowledge Graph Embedding based Drug-Drug Interaction Prediction

Interaction between pharmacological agents can trigger unexpected advers...

Modeling Pharmacological Effects with Multi-Relation Unsupervised Graph Embedding

A pharmacological effect of a drug on cells, organs and systems refers t...

Conformal Prediction in Learning Under Privileged Information Paradigm with Applications in Drug Discovery

This paper explores conformal prediction in the learning under privilege...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The timeline of the Covid-19 pandemic showcases the dire need for fast development of effective treatments for new diseases. Drug-repurposing is a drug discovery strategy from existing drugs that significantly shortens the time and reduces the cost compared to de novo drug discovery (sertkaya2014examination; avorn20152; setoain2015nffinder). Drug-repurposing leverages the fact that common molecular pathways contribute to different diseases and hence some drugs may be reused (ashburn2004drug).

Drug-repurposing relies on identifying novel interactions among biological entities like genes and compounds and can be posed as a link prediction task over a biological network. Several machine learning approaches have been developed for addressing the drug-repurposingtask for Covid-19; see e.g. (gramatica2014graph; zhou2020network; udrescu2016clustering; drkg2020). Towards assisting such machine learning techniques (drkg2020) created a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms termed Drug Repurposing Knowledge Graph (DRKG).

However, for novel diseases like Covid-19 only a few interactions are available among viral proteins and possible chemical compounds that may inhibit the related genes. This motivates the framework of few-shot link prediction, where a certain edge type is rare and the model is called to make predictions on the particular edge type.

1.1 Related works

Link prediction has been addressed by several works in the context of knowledge-graph (KG) completion. These models rely on embedding the nodes and edges of the KG to a vector space and then train by maximizing the score for existing edges in the KGs; see e.g., 

(wang2017knowledge). An efficient implementation of these models in DGL is presented in (zheng2020dgl). Nevertheless, these KGE models do not naturally generalize in the few-shot scenario, where only a few edges are available for a rare edge type, which challenges learning the relation embedding. This was addressed in (chen2019meta), where a meta-learning model is proposed to learn the relation embeddings in an inductive fashion. However, this inductive-relation KGE model require a specialized training scheme, can not learn inductive node embeddings, and can not incorporate node features if available.

Graph convolutional networks learn embeddings for nodes and edges in the graph by applying a sequence of nonlinear operations parametrized by the graph adjacency matrix and utilize node and edge features (kipf2016semi; schlichtkrull2018modeling). An inductive implementation of these models allows for learning node embeddings in an inductive fashion (hamilton2017inductive). The RGCN model (schlichtkrull2018modeling) has been successful in link prediction, where the RGCN is supervised by KGE models (wang2017knowledge). However, these GCN models for link prediction inherit the limitation of the KGE models, and are challenged in learning relation embeddings for rare edges types.

1.2 Contributions

This paper addresses the aforementioned limitation of GCN models by introducing a novel inductive-RGCN that learns the relation and the node embeddings in an inductive fashion. The proposed I-RGCN naturally addresses the few-shot link prediction and outperforms competing state-of-the-art models. I-RGCN is also tested in the DRKG for Covid-19 drug-repurposing. The drug discovery task is naturally formulated in a few-shot learning setting. The preliminary results indicate that several drugs used in clinical trials are discovered as possible drug candidates. While this study, by no means recommends specific drugs, it demonstrates a powerful deep learning methodology to prioritize existing drugs for further investigation, which holds the potential of accelerating therapeutic development for COVID-19.

2 Few-shot link prediction formulation

Consider the heterogeneous graph with node types and relation types defined as . The th node type is defined as and may represent Genes or Chemical compounds in the DRKG. The th relation type holds all interactions of a certain type among and and may represent that a chemical compound inhibits a gene or that a disease is treated by a chemical compound.

Consider also that each node is associated with a feature vector . This feature may represent an embedding of the protein sequence associated with a gene (wang2017accurate). In KGs some node types may not have features for these we use an embedding layer to represent their features.

Few shot link prediction. Given sets of edges , a nodal attribute vector per node , and a small set of links in the few-shot relation with , the few-shot link prediction amounts to inferring the missing links of the rare type . In the DRKG, this few-shot relation is for example coronavirus treatment.

3 Learning inductive embedding for GNNs

The relational GCN (RGCN) (schlichtkrull2018modeling) extends the graph convolution operation (kipf2016semi) to heterogenous graphs. An RGCN model is comprised by a sequence of RGCN layers. The th layer computes the th node representation as follows


where is the neighborhood of node under relation ,

the rectified linear unit non linear function, and

is a learnable matrix associated with the th relation. Essentially, the output of the RGCN layer for node

is a nonlinear combination of the hidden representations of neighboring nodes weighted based on the relation type. The node features are the input of the first layer in the model i.e.

, where is the node feature for node

. For node types without features we use an embedding layer that takes as input an one-hot encoding of the node id.

The RGCN model in this paper is supervised by a DistMult model (yang2014embedding)

for link prediction. The loss function


where denotes the transpose of a matrix, denotes a diagonal matrix with on its diagonal, , , are the embedding of the head entity , relation and the tail entity , respectively and and are the positive and negative sets of triplets and if the triplet corresponds to a positive example and otherwise. The scalar represented by denotes the score of triplet as given by the DistMult model (yang2014embedding). The entity embeddings are obtained by the final layer of the RGCN. The relation type embedding are trained directly from (2).

Such a model (2) is vulnerable when only few training edges are available for a certain relation type. The small number of edges will challenge the learning of the embedding vector for the rare relation.

3.1 Inductive RGCN

Certain relation-types may be rare in the training set of links and require a specialized architecture. To address such a few-shot scenario, we introduce a MLP to learn the relation embeddings. Consider the node embeddings and extracted from the ultimate layer of the RGCN model where and . The proposed MLP learns an embedding for the th relation as follows


where denotes the vector concatenation. Note that the relation embedding is calculated as a nonlinear function of the node embedding for all node pairs participating to a certain relation type . This allows the I-RGCN to learn relation embeddings in an inductive fashion. This model is supervised by the following loss


where denotes the triple score and is for the negative triples and stands for the positive ones. We create negative triples by fixing the head node of a positive triple and randomly selecting a tail node of the same type as the original tail node. Differently from (2), the relation embedding is learned in an inductive fashion from the participating node pairs (, ). Hence, upon learning the MLP parameters the relation embedding will be computed with a forward pass. This obviates the few-shot learning hurdle and enables the model to generalize to rare or even unseen relations.

4 Experiments

4.1 Few shot link prediction

Nodes Edges Relation types
movie : 4,278
director : 2,081
actor : 5,257
17,106 12
author : 4,057
paper : 14,328
term : 7,723
venue : 20
119,783 12
Table 1: Statistics of datasets.

Baselines.We consider the state-of-the-art KGE models RotatE (sun2019rotate), ComplEx (trouillon2016complex), and the RGCN model (schlichtkrull2018modeling) as baselines for comparison. The parameters of these methods have been optimized via cross validation.

MRR Hit 1 Hit 10
K ComplEx RotatE RGCN I-RGCN ComplEx RotatE RGCN I-RGCN ComplEx RotatE RGCN I-RGCN
10 6.88 11.80 1.32 33.56 1.75 6.74 0.13 25.32 14.25 18.35 1.07 53.55
50 8.48 12.56 15.26 53.24 3.34 7.76 7.42 45.14 15.70 19.12 28.88 69.32
100 8.61 12.57 18.78 53.63 3.44 7.86 9.59 40.27 15.71 18.40 36.84 77.38
1000 68.37 70.09 95.23 96.06 65.48 67.52 91.72 93.56 72.23 73.50 99.72 99.81
Table 2: Experiment results (%) on the IMDb dataset for k-shot link prediction.

We use the IMDB and DBLP datasets (fu2020magnn) detailed in Table 1. The total number of edges in the few-shot relation are for the IMDB and for the DBLP. In the experiments. we train with only links from the few-shot relation and all the links from the other relations and test on the rest edges of the few-shot relation, which are . The nodes in the IMDB and DBLP graphs are associated with feature vectors. Further, information on the datasets is included in the Appendix.

Tables 2 and 3 report the MRR, Hit-1 and Hit-10 scores of the baseline methods along with the inductive RGCN and the RGCN in the task of few-shot link prediction for the IMDB and DBLP datasets respectively. The I-RGCN significantly outperforms the alternative methods in the task of few-shot link prediction. Specifically, for =10 the MRR of the inductive method is one order of magnitude greater. This corroborates the advantage of the inductive relation learning for the few-shot learning task. As the number of training edges increases at =1000, it is observed that the RGCN performance approaches the performance of the I-RGCN. This suggests that the I-RGCN method performs well also in non few-shot learning tasks. The worse performance of KGE models is explained since these do not account for node features and do learn inductive relation embeddings.

To further validate the performance of the I-RGCN we conduct a general link prediction evaluation by splitting the links in training, validation, and testing at random irrespective of their relation type. The results for different percentages of training links are reported in Table 4. I-RGCN outperforms even in this training scenario the RGCN and KGEs baselines, which further corroborates the efficiency of the model.

MRR Hit 1 Hit 10
K ComplEx RotatE RGCN I-RGCN ComplEx RotatE RGCN I-RGCN ComplEx RotatE RGCN I-RGCN
10 5.97 7.41 7.42 27.95 1.37 2.13 2.95 13.78 11.81 15.36 12.51 60.51
50 6.32 7.87 17.25 83.42 1.55 2.26 10.95 72.54 12.59 16.64 26.84 96.82
100 7.24 10.66 32.45 90.00 1.99 04.43 23.46 85.27 14.96 20.92 49.85 97.61
1000 36.56 46.51 91.34 96.82 30.62 39.83 86.43 94.41 46.45 59.27 98.59 99.81
Table 3: Experiment results (%) on the DBLP dataset for k-shot link prediction.
Metrics MRR Hit 1 Hit 10
Training links ComplEx RotatE RGCN I-RGCN ComplEx RotatE RGCN I-RGCN ComplEx RotatE RGCN I-RGCN
95% 94.15 93.97 93.38 95.12 93.75 93.48 89.31 92.75 94.74 94.56 99.29 98.24
90% 88.87 88.89 89.30 93.98 88.25 87.87 83.45 91.52 89.74 90.35 97.66 98.12
80% 78.55 78.90 83.46 90.03 77.45 76.83 76.59 86.70 79.96 81.94 94.72 95.93
70% 69.59 69.56 82.73 87.00 67.98 66.73 76.09 82.95 71.89 73.49 93.76 94.07
60% 60.40 60.90 78.16 81.53 58.49 57.60 70.63 77.01 62.92 65.57 91.14 90.27
Table 4: Experiment results (%) on the IMDb dataset for link prediction.

4.2 Drug-repurposing via I-RGCN

For this experiment we will utilize the drug-repurposing knowledge graph (DRKG) constructed in (drkg2020). The DRKG collects interactions from a collection of biological databases such as Drugbank (drugbank@2017), GNBR (percha2018global), Hetionet (hetionet@2017), STRING (string@2019), IntAct (intact) and DGIdb (dgidb@2017).

Drug-repurposing aims at discovering the most effective existing drugs to treat a certain disease. Drug-repurposing can be formulated as predicting direct links in the DRKG such as predicting whether a drug treats a disease or as predicting whether a compound inhibits a certain gene which is related to the target disease. Drug-repurposing can be viewed as a few-shot link prediction task since only a few edges are available related to novel diseases in the DRKG.

We use corona-virus related diseases, including SARS, MERS and SARS-COV2, as target diseases representing Covid-19 as their functionality is similar. We aim at predicting links among gene entities associated with the target disease and drug entities.We select FDA-approved drugs in Drugbank as candidates, while we exclude for simplicity drugs with molecule weight less than 250 daltons, as many of certain drugs are actually health drugs. This amounts to 8104 candidate drugs.

We also obtain 442 Covid-19 related genes from the relations extracted from (gordon2020sars; zhou2020network). Similarly, we obtain the node embeddings for the gene and drugs, and the embeddings for the corresponding relations. Next, we score all triples and rank them per target gene. This way we obtain 442 ranked lists of drugs. Finally, to assess whether our prediction is in par with the drugs used for treatment, we check the overlap among the top 100 predicted drugs and the drugs used in clinical trials per gene. We used 32 clinical trial drugs for Covid-19 to validate our predictions111The clinical trial drugs were collected from http://www.covid19-trails.com/. Table 5 lists the clinical drugs included in the top-100 predicted drugs across all the genes with their corresponding number of hits for the RGCN and I-RGCN. It can be observed, that several of the widely used drugs in clinical trials appear high on the predicted list, and that I-RGCN shows a higher hit rate than RGCN. Hence, the inductive relation prediction module is more appriopriate in predicting links when information about the nodes is limited, such as is the case with the novel Covid-19 disease node.

Drug name # hits Drug name # hits
Dexamethasone 240 Chloroquine 69
Ribavirin 142 Colchicine 41
Colchicine 128 Tetrandrine 40
Chloroquine 115 Oseltamivir 37
Methylprednisolone 86 Azithromycin 36
Tofacitinib 75 Tofacitinib 33
Thalidomide 70 Ribavirin 32
Losartan 64 Methylprednisolone 30
Hydroxychloroquine 48 Deferoxamine 30
Oseltamivir 46 Thalidomide 25
Deferoxamine 34 Dexamethasone 24
Ruxolitinib 23 Bevacizumab 21
Azithromycin 23 Hydroxychloroquine 19
Nivolumab 11 Losartan 19
Tradipitant 11 Ruxolitinib 13
Bevacizumab 10 Eculizumab 12
Eculizumab 7 Tocilizumab 11
Baricitinib 6 Anakinra 11
Sarilumab 6 Sarilumab 8
Tetrandrine 6 Nivolumab 6
Table 5:

Drug inhibits gene scores for Covid-19. Note that a random classifier will result to approximately 5.3 per drug. This suggests that the reported predictions are significantly better than random.

5 Conclusion

In this paper we develop a novel I-RGCN that learns inductive relation embeddings and can be applied for few-shot link prediction and drug repurposing. I-RGCN consistently outperforms baseline models in the IMDB and DBLP datasets for few-shot link prediciton. We also formulate the Covid-19 drug-repurposing task as a link prediction over the DRKG. I-RGCN successfully identifies a subset of clinical trial drugs for Covid-19 and can be used to assist researchers and prioritize existing drugs for further investigation in the Covid-19 treatment.


Appendix A Datasets

We use the IMDB and DBLP datasets (fu2020magnn) detailed in Table 1, where the third column denotes the total size of edges in the few-shot relation that is . The nodes in the IMDB and DBLP graphs are associated with feature vectors. The original datasets in (fu2020magnn) are used for node classification. We adapt the datasets and create new edge types, where the edges are parametrized by the label of the associated nodes. For example, the edge type (director, directed, movie) becomes (director, directed_drama, movie) if the associated movie is in the drama genre, and the same transformation undergoes the (actor, played, movie) relation. Since, there are 3 labels for movies, this way the original 4 edge types become 12. The same transformation happens in the DBLP dataset.