1 Introduction
Advancements in lowcost highthroughput sequencing and data acquisition technologies have given rise to a massive proliferation of data describing biological systems. Biomedical knowledge graphs (KGs) are becoming increasingly popular as backbones for artificial intelligence tasks such as personalized medicine, predictive diagnosis, and drug discovery
(Dörpinghaus and Jacobs, 2019).From a machine learning perspective, reasoning on biomedical KGs presents new challenges for existing approaches because of the unique structural characteristics of the graphs. One challenge arises due to the highly coupled nature of entities in biological systems that leads to many highdegree and densely interlinked entities. A second challenge is the requirement of information beyond secondorder neighborhoods for reasoning about the relationship between two entities (Himmelstein et al., 2017) so that approaches where longrange interactions are incorporated only via node embeddings (e. g., RESCAL (Nickel et al., 2011), TransE (Bordes et al., 2013)) tend to underperform. Unfortunately, approaches that explicitly take the entire multihop neighborhoods into account (e. g., graph convolutional models, RGCN (Schlichtkrull et al., 2018)), often have diminishing performance beyond twohop neighborhoods (i. e., more than two convolutional layers). Furthermore, highdegree entities can cause the aggregation operations to smooth out the signals. Alternatively, symbolic reasoning approaches (e. g., RuleN (Meilicke et al., 2018), AnyBURL (Meilicke et al., 2019)) learn logical rules and employ them during inference. However, due to the massive scale and diverse topologies of many realworld KGs, combinatorial complexity often prevents the usage of symbolic approaches. Also, logical inference has difficulties handling noise in the data. Recently, pathbased reasoning methods have become popular, and they present a seemingly ideal balance for combining information over multihop neighborhoods.
We propose a novel neurosymbolic KG reasoning approach that combines pathbased approaches with representation learning and logical rules. These rules can be either mined from data or obtained from domain experts. Inspired by existing methods (Das et al., 2018; Lin et al., 2018; Hildebrandt et al., 2020a, b), we use reinforcement learning to train an agent to conduct policyguided random walks on a KG. We propose a modification by introducing a reward function that allows the agent to leverage background knowledge formalized as metapaths. In summary, our paper makes the following contributions:

We propose a novel neurosymbolic approach that combines neural multihop reasoning based on reinforcement learning with logical rules.

We conduct an empirical study of several stateoftheart algorithms applied to a large biomedical KG.

We show that our proposed approach outperforms stateoftheart alternatives on a highly relevant biomedical prediction task (drug repurposing).
As an application of our method, we focus on the drug repurposing problem, which is characterized by finding new treatment targets for existing drugs. By repurposing existing drugs, available knowledge about drugdiseaseinteractions can be leveraged to reduce time and cost for developing new drugs significantly. A recent example is the repositioning of the medication remdesivir for the novel coronavirus disease COVID19. We aim at generating candidates for the drug repurposing task with machine learning reasoning methods and formulate the task as a link prediction problem, where both compounds and diseases correspond to entities in a KG.
2 Notation
Let denote the set of entities in a KG and the set of binary relations. Elements in correspond to biomedical entities including, e. g., chemical compounds, diseases, and genes. Each entity belongs to a unique type in , defined by the mapping . For example, indicates that the entity AURKC has type Gene. We define a KG as a collection of triples of the form , which consists of head, relation, and tail. Head and tail entities correspond to nodes in the graph, while the relation indicates the type of edge between them. For any relation , we denote the corresponding inverse relation with (i. e., is equivalent to ). Triples in are interpreted as true known facts. For example, the triple in Figure 2 corresponds to the fact that the kinase inhibitor drug sorafenib is approved for the treatment of liver cancer.
We further distinguish between two types of paths: instance paths and metapaths. An instance path of length on is given by a sequence
where . Moreover, we call
a metapath. For example,
constitutes an instance path of length 2, where
is the corresponding metapath.
Logical rules (e.g., the commonly used Horn clauses) are usually written in the form . The head can be written out as a triple, and the body can be expressed as a metapath. Define . Then, a rule with respect to edges of type treats is of the generic form
In particular, the body of a rule corresponds to a metapath starting at a compound and terminating at a disease. The goal is to find instance paths where the corresponding metapaths match the body of a rule to predict a new relation between the source and the target of the instance path. The confidence of a rule indicates how often a rule is correct and is defined as the rule support divided by the body support in the data.
3 Our Method
We pose the task of drug repurposing as a link prediction problem based on graph traversal. Starting at a query entity (e.g., a compound to be repurposed), an agent performs a walk on the graph by sequentially transitioning to a neighboring node. The decision of which transition to make is determined by a stochastic policy. Each subsequent transition is added to the current path, extending the reasoning chain, until a finite number of transitions is reached. The general approach is inspired by the reinforcement learning method MINERVA (Das et al., 2018), with our primary contribution coming from the incorporation of logical rules into the training process.
The state of the environment consists of the entity where the agent is located at time , the source entity , and the target entity , where and correspond to the compound that we aim to repurpose and the target disease, respectively. Thus, a state for time is represented by . The agent is given no information about the target disease so that the observed part of the state space is given by . Let denote the embedding of entity and the embedding of relation . The set of available actions contains all outgoing edges from the node with the corresponding target nodes and the option to stay at the current node with no transition. We denote with the action that the agent performed at time . The environment evolves deterministically by updating the state according to the previous action.
The agent encodes previous actions via a multilayered LSTM (Hochreiter and Schmidhuber, 1997)
(1) 
where
corresponds to the vector space embedding of the previous action (or the zero vector at time
). The action distribution is given by(2) 
where and are weight matrices and the rows of contain the latent representations of all admissible actions from . An action is sampled according to Overall, transitions are sampled, resulting in a path denoted by
where is the maximum path length. Equations (1) and (2) induce a stochastic policy, represented by where denotes the set of all trainable parameters, including all entity and relation embeddings.
Furthermore, let be the set of metapaths, where each element corresponds to the body of a rule. For every metapath , we assign a score that indicates a quality measure of the corresponding rule, such as the confidence or the support with respect to making a correct prediction. For a path , we denote with the corresponding metapath.
During training, a terminal reward is computed according to
The first term indicates whether the agent has reached the correct target disease. The second term checks whether the metapath corresponds to the body of a rule and adds to the score accordingly. Heuristically speaking, we want to reward the agent with a higher score for extracting a metapath that corresponds to a body. The hyperparameter
balances the two components of the reward. For , we recover MINERVA.We employ REINFORCE (Williams, 1992) to maximize the expected rewards. Thus, the agent’s maximization problem is given by
(3) 
where denotes the true underlying distribution of the set of chemical compounds.
4 Experiments
4.1 Dataset
Hetionet (Himmelstein et al., 2017) is a biomedical KG that integrates data from 29 highly reputable and cited public databases. It consists of 47,031 entities with 11 different types and 2,250,197 edges with 24 different types. We aim to predict edges with type treats between entities that correspond to compounds and diseases. The goal is to perform candidate ranking according to the likelihood of successful drug repurposing in a novel treatment application. There are 1552 compounds and 137 diseases in Hetionet with 775 observed links of type treats between compounds and diseases.
4.2 Metapaths as Background Information
Himmelstein et al. (2017) compiled a list of 1206 metapaths corresponding to various pharmacological efficacy mechanisms that connect entities of type Compound with entities of type Disease. Through hypothesis testing and domain expertise, they identified
effective metapaths that served as features for a logistic regression model. Out of these metapaths, we select the 10 metapaths as background information that have at most path length 3 and exhibit positive regression coefficients, indicating their importance for predicting drug efficacy. The metapaths are included as rule bodies in
, where the rule head is always (Compound, treats, Disease). We estimate the confidence score for each rule by sampling 10,000 paths whose metapaths correspond to the rule body and use the confidence for the score
(see Section 3). Table 1 shows the three metapaths with the highest confidences.Metapath  

0.446  
0.265  
0.184 
4.3 Experimental Setup
We apply our method, denoted by MINERVA+, to Hetionet and calculate hits@1, hits@3, hits@10, and the mean reciprocal rank (MRR). During inference, a beam search is carried out, and the entities are ranked by the probability of their corresponding paths. Moreover, we consider another evaluation scheme (MINERVA+ (pruned)) that retrieves and ranks only those paths from the test rollouts that correspond to one of the metapaths. All the other extracted paths are not considered in the ranking. We compare our approach with the pathbased method MINERVA, the rulebased method AnyBURL, and the embeddingbased methods TransE, RESCAL, and RGCN.
4.4 Results
Method  Hits@1  Hits@3  Hits@10  MRR 

AnyBURL  
AnyBURL (metapaths)  
TransE  
RESCAL  
RGCN  
MINERVA  
MINERVA+  
MINERVA+ (pruned) 
Table 2
displays the test results for the experiments. The reported values for MINERVA and MINERVA+ correspond to the mean across five independent training runs. The standard errors lie between
and . This indicates that the reported performance gains are highly significant.AnyBURL only learns one rule for the relation that has a length of at least 2. To see the effect of applying a larger number of rules, we try a setting where we use the metapaths for the prediction step, which leads to significantly improved results. TransE and RGCN show similar performance, and RESCAL performs best among the embeddingbased methods. Applying the modified ranking scheme, our method yields performance gains of for hits@1, for hits@3, for hits@10, and for MRR with respect to best performing baseline method.
4.5 Discussion
Our method can act as a generic mechanism to inject domain knowledge into reinforcement learningbased reasoning methods on KGs (Lin et al., 2018; Xiong et al., 2017). While we employ rules that are extracted in a datadriven fashion, our method is agnostic towards the source of background information. The additional reward for extracting a rule (see Equation (3)) can be considered as a regularization that enforces the agent to walk along metapaths that generalize to unseen instances.
AnyBURL is strictly outperformed by both MINERVA and our method. Most likely, the large amount of highdegree nodes in Hetionet lead to the outcome that hardly any strong, predictive rules are extracted. Multihop reasoning methods contain a natural transparency mechanism by providing explicit inference paths. Surprisingly, our experimental findings show that pathbased reasoning methods outperform existing blackbox methods on the drug repurposing task without a tradeoff between explainability and performance. Both TransE and RESCAL are trained to minimize the reconstruction error in the immediate firstorder neighborhood, and our results indicate that these methods seem not to be suitable for the drug repurposing task. RGCN is in principle capable of modeling longterm dependencies due to the receptive field containing the entire set of nodes in the multihop neighborhood. However, the aggregation and combination step of RGCN essentially acts as a lowpass filter on the incoming signals, and in the presence of many highdegree nodes, the center nodes may receive an uninformative signal that smooths over the neighborhood embeddings.
To illustrate the applicability of our method, consider the compound sorafenib from Figure 2. The three highest predictions of our model for new target diseases include hematologic cancer, breast cancer, and Barrett’s esophagus. The database ClinicalTrails.gov (U. S. National Library of Medicine, 2000) lists 23 clinical studies for testing the effect of sorafenib on these three diseases, showing that the predictions are meaningful targets for further investigation.
5 Conclusion
We have proposed a novel neurosymbolic knowledge graph reasoning approach that leverages pathbased reasoning, representation learning, and logical rules. We apply our method to the highly relevant task of drug repurposing and compare our approach with both embeddingbased and rulebased methods. We achieve better performance and an improvement of for hits@1 and for the mean reciprocal rank compared to popular baselines.
Acknowledgements
This work has been supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) as part of the project RAKI (no. 01MD19012C).
References
 Translating embeddings for modeling multirelational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’13, Vol. 2, pp. 2787–2795. Cited by: §1.
 Go for a walk and arrive at the answer: reasoning over paths in knowledge bases using reinforcement learning. In Proceedings of the 6th International Conference on Learing Representations, Cited by: §1, §3.
 Semantic knowledge graph embeddings for biomedical research: data integration using linked open data. In Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems (SEMANTiCS), CEUR Workshop Proceedings, Vol. 2451. Cited by: §1.
 Scene graph reasoning for visual question answering. arXiv:2007.01072. External Links: 2007.01072 Cited by: §1.
 Reasoning on knowledge graphs with debate dynamics. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, Cited by: §1.
 Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife 6, pp. e26726. Cited by: §1, §4.1, §4.2, footnote 3.
 Long shortterm memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §3.

Multihop knowledge graph reasoning with reward shaping.
In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
, pp. 3243–3253. Cited by: §1, §4.5.  Anytime bottomup rule learning for knowledge graph completion. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3137–3143. Cited by: §1.
 Finegrained evaluation of rule and embeddingbased systems for knowledge graph completion. In The Semantic Web – ISWC 2018, Lecture Notes in Computer Science, Vol. 11136, pp. 3–20. Cited by: §1.
 A threeway model for collective learning on multirelational data.. In Proceedings of the 28th International Conference on Machine Learning, Cited by: §1.
 Modeling relational data with graph convolutional networks. In The Semantic Web – ESWC 2018, Lecture Notes in Computer Science, Vol. 10843, pp. 593–607. Cited by: §1.
 clinicaltrails.gov. Cited by: §4.5.
 Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine Learning 8 (34), pp. 229–256. Cited by: §3.
 DeepPath: a reinforcement learning method for knowledge graph reasoning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 564–573. Cited by: §4.5.
Comments
There are no comments yet.