NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

10/21/2022
by   Natalia Loukachevitch, et al.
0

This paper describes NEREL-BIO – an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL -> NEREL-BIO) and cross-language (English -> Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension (MRC) models and report their results. The dataset is freely available at https://github.com/nerel-ds/NEREL-BIO.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2021

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

In this paper, we present NEREL, a Russian dataset for named entity reco...
research
05/23/2022

RuNNE-2022 Shared Task: Recognizing Nested Named Entities

The RuNNE Shared Task approaches the problem of nested named entity reco...
research
04/08/2020

SIA: A Scalable Interoperable Annotation Server for Biomedical Named Entities

Recent years showed a strong increase in biomedical sciences and an inhe...
research
05/24/2021

DaN+: Danish Nested Named Entities and Lexical Normalization

This paper introduces DaN+, a new multi-domain corpus and annotation gui...
research
04/21/2022

TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla

Many areas, such as the biological and healthcare domain, artistic works...
research
11/27/2019

NorNE: Annotating Named Entities for Norwegian

This paper presents NorNE, a manually annotated corpus of named entities...
research
12/03/2019

An Annotated Dataset of Coreference in English Literature

We present in this work a new dataset of coreference annotations for wor...

Please sign up or login with your details

Forgot password? Click here to reset