Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

by   Ming-Siang Huang, et al.

The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid less attention to achieve the criteria for BRE task. In this study, we present Revised JNLPBA corpus, the revision of JNLPBA corpus, to broaden the applicability of a NER corpus from BNER to BRE task. We preserve the original entity types including protein, DNA, RNA, cell line and cell type while all the abstracts in JNLPBA corpus are manually curated by domain experts again basis on the new annotation guideline focusing on the specific NEs instead of general terms. Simultaneously, several imperfection issues in JNLPBA are pointed out and made up in the new corpus. To compare the adaptability of different NER systems in Revised JNLPBA and JNLPBA corpora, the F1-measure was measured in three open sources NER systems including BANNER, Gimli and NERSuite. In the same circumstance, all the systems perform average 10 than in JNLPBA. Moreover, the cross-validation test is carried out which we train the NER systems on JNLPBA/Revised JNLPBA corpora and access the performance in both protein-protein interaction extraction (PPIE) and biomedical event extraction (BEE) corpora to confirm that the newly refined Revised JNLPBA is a competent NER corpus in biomedical relation application. The revised JNLPBA corpus is freely available at


page 1

page 2

page 3

page 4


BioRED: A Comprehensive Biomedical Relation Extraction Dataset

Automated relation extraction (RE) from biomedical literature is critica...

RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain

We show-case an application of information extraction methods, such as n...

KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks

As digitized traditional cultural heritage documents have rapidly increa...

Biomedical Information Extraction for Disease Gene Prioritization

We introduce a biomedical information extraction (IE) pipeline that extr...

Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework

In order to assist the drug discovery/development process, pharmaceutica...

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Biomedical named entity recognition (NER) is a critial task that aims to...

Please sign up or login with your details

Forgot password? Click here to reset