MedDistant19: A Challenging Benchmark for Distantly Supervised Biomedical Relation Extraction

04/10/2022
by   Saadullah Amin, et al.
10

Relation Extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used as a way to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Distantly Supervised Biomedical Relation Extraction (Bio-DSRE) models can seemingly produce very accurate results in several benchmarks. However, given the challenging nature of the task, we set out to investigate the validity of such impressive results. We probed the datasets used by Amin et al. (2020) and Hogan et al. (2021) and found a significant overlap between training and evaluation relationships that, once resolved, reduced the accuracy of the models by up to 71 data construction process, such as creating negative samples and improper handling of redundant relationships. We mitigate these issues and present MedDistant19, a new benchmark dataset obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms (SNOMED CT) knowledge base. We experimented with several state-of-the-art models achieving an AUC of 55.4 of room for improvement.

READ FULL TEXT
research
10/24/2021

Abstractified Multi-instance Learning (AMIL) for Biomedical Relation Extraction

Relation extraction in the biomedical domain is a challenging task due t...
research
05/26/2020

A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction

Fact triples are a common form of structured knowledge used within the b...
research
04/30/2020

TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

TACRED (Zhang et al., 2017) is one of the largest, most widely used crow...
research
06/21/2015

Extreme Extraction: Only One Hour per Relation

Information Extraction (IE) aims to automatically generate a large knowl...
research
07/02/2019

Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns

Knowledge base construction is crucial for summarising, understanding an...
research
04/13/2022

A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

We introduce ChemDisGene, a new dataset for training and evaluating mult...
research
09/24/2021

Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

State-of-the-art NLP models can adopt shallow heuristics that limit thei...

Please sign up or login with your details

Forgot password? Click here to reset