Parallel sequence tagging for concept recognition

03/16/2020
by   Lenz Furrer, et al.
0

Motivation: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence. Results: We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task 2019. Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts). Availability and Implementation: Source code freely available for download at https://github.com/OntoGene/craft-st. Supplementary data are available at arXiv online.

READ FULL TEXT
research
08/17/2020

HunFlair: An Easy-to-Use Tool for State-of-the-Art Biomedical Named Entity Recognition

Summary: Named Entity Recognition (NER) is an important step in biomedic...
research
06/01/2021

SpanNER: Named Entity Re-/Recognition as Span Prediction

Recent years have seen the paradigm shift of Named Entity Recognition (N...
research
04/28/2022

HiNER: A Large Hindi Named Entity Recognition Dataset

Named Entity Recognition (NER) is a foundational NLP task that aims to p...
research
05/09/2020

The Structured Weighted Violations MIRA

We present the Structured Weighted Violation MIRA (SWVM), a new structur...
research
04/08/2020

SIA: A Scalable Interoperable Annotation Server for Biomedical Named Entities

Recent years showed a strong increase in biomedical sciences and an inhe...
research
10/25/2022

Influence Functions for Sequence Tagging Models

Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagg...
research
08/07/2022

SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence Labeling

Weak labeling is a popular weak supervision strategy for Named Entity Re...

Please sign up or login with your details

Forgot password? Click here to reset