End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies

04/03/2023
by   Xuguang Ai, et al.
0

End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in > 4% improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.

READ FULL TEXT

page 1

page 4

page 5

research
04/08/2022

BioRED: A Comprehensive Biomedical Relation Extraction Dataset

Automated relation extraction (RE) from biomedical literature is critica...
research
12/20/2019

End-to-end Named Entity Recognition and Relation Extraction using Pre-trained Language Models

Named entity recognition (NER) and relation extraction (RE) are two impo...
research
10/05/2021

FoodChem: A food-chemical relation extraction model

In this paper, we present FoodChem, a new Relation Extraction (RE) model...
research
02/17/2021

Jointly Learning Clinical Entities and Relations with Contextual Language Models and Explicit Context

We hypothesize that explicit integration of contextual information into ...
research
03/29/2023

End-to-End n-ary Relation Extraction for Combination Drug Therapies

Combination drug therapies are treatment regimens that involve two or mo...
research
05/13/2019

Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model

Computational chemistry develops fast in recent years due to the rapid g...
research
09/22/2020

Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

Despite efforts to distinguish three different evaluation setups (Bekoul...

Please sign up or login with your details

Forgot password? Click here to reset