BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

06/19/2023
by   Po-Ting Lai, et al.
0

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4 F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.

READ FULL TEXT
research
04/08/2022

BioRED: A Comprehensive Biomedical Relation Extraction Dataset

Automated relation extraction (RE) from biomedical literature is critica...
research
01/20/2020

BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction

Successful biomedical relation extraction can provide evidence to resear...
research
01/18/2019

Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction

The biomedical literature provides a rich source of knowledge such as pr...
research
12/23/2019

BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale

Capturing the semantics of related biological concepts, such as genes an...
research
09/30/2020

Extracting Concepts for Precision Oncology from the Biomedical Literature

This paper describes an initial dataset and automatic natural language p...
research
04/27/2023

BactInt: A domain driven transfer learning approach and a corpus for extracting inter-bacterial interactions from biomedical text

The community of different types of microbes present in a biological nic...
research
09/24/2022

Developing a Knowledge Graph Framework for Pharmacokinetic Natural Product-Drug Interactions

Pharmacokinetic natural product-drug interactions (NPDIs) occur when bot...

Please sign up or login with your details

Forgot password? Click here to reset