Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

by   Rahul Nadkarni, et al.

Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug design and repurposing. Recent work has shown that general-domain language models (LMs) can serve as "soft" KGs, and that they can be fine-tuned for the task of KG completion. In this work, we study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We evaluate several domain-specific LMs, fine-tuning them on datasets centered on drugs and diseases that we represent as KGs and enrich with textual entity descriptions. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance. Finally, we demonstrate the advantage of LM models in the inductive setting with novel scientific entities. Our datasets and code are made publicly available.


page 1

page 2

page 3

page 4


Triple Classification for Scholarly Knowledge Graph Completion

Scholarly Knowledge Graphs (KGs) provide a rich source of structured inf...

Biomedical Knowledge Graph Refinement and Completion using Graph Representation Learning and Top-K Similarity Measure

Knowledge Graphs have been one of the fundamental methods for integratin...

Towards Incorporating Entity-specific Knowledge Graph Information in Predicting Drug-Drug Interactions

Off-the-shelf biomedical embeddings obtained from the recently released ...

Neural Multi-Hop Reasoning With Logical Rules on Biomedical Knowledge Graphs

Biomedical knowledge graphs permit an integrative computational approach...

Fine-tuning Pretrained Language Models with Label Attention for Explainable Biomedical Text Classification

The massive growth of digital biomedical data is making biomedical text ...

Transformers and the representation of biomedical background knowledge

BioBERT and BioMegatron are Transformers models adapted for the biomedic...

BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification

Healthcare predictive analytics aids medical decision-making, diagnosis ...

Code Repositories


Using pretrained language models for biomedical knowledge graph completion.

view repo