BioRED: A Comprehensive Biomedical Relation Extraction Dataset

by   Ling Luo, et al.

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical), on a set of 600 PubMed articles. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including BERT-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3 extracting novel relations (F-score of 47.7 that such a comprehensive dataset can successfully facilitate the development of more accurate, efficient, and robust RE systems for biomedicine.


page 1

page 2

page 6

page 8


Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

The advancement of biomedical named entity recognition (BNER) and biomed...

Biomedical Information Extraction for Disease Gene Prioritization

We introduce a biomedical information extraction (IE) pipeline that extr...

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

Biomedical relation extraction (RE) is the task of automatically identif...

Chemical-protein relation extraction with ensembles of SVM, CNN, and RNN models

Text mining the relations between chemicals and proteins is an increasin...

End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies

End-to-end relation extraction (E2ERE) is an important task in informati...

BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer

A biomedical relation statement is commonly expressed in multiple senten...

Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models

In Track-1 of the BioCreative VII Challenge participants are asked to id...

Please sign up or login with your details

Forgot password? Click here to reset