Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping

06/01/2019
by   Joel Mathew, et al.
0

We present a weakly-supervised data augmentation approach to improve Named Entity Recognition (NER) in a challenging domain: extracting biomedical entities (e.g., proteins) from the scientific literature. First, we train a neural NER (NNER) model over a small seed of fully-labeled examples. Second, we use a reference set of entity names (e.g., proteins in UniProt) to identify entity mentions with high precision, but low recall, on an unlabeled corpus. Third, we use the NNER model to assign weak labels to the corpus. Finally, we retrain our NNER model iteratively over the augmented training set, including the seed, the reference-set examples, and the weakly-labeled examples, which improves model performance. We show empirically that this augmented bootstrapping process significantly improves NER performance, and discuss the factors impacting the efficacy of the approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2020

Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision

We created this CORD-19-NER dataset with comprehensive named entity reco...
research
04/13/2021

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Instead of using expensive manual annotations, researchers have proposed...
research
05/29/2023

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Biomedical named entity recognition (NER) is a critial task that aims to...
research
10/26/2020

Using Unlabeled Texts for Named-Entity Recognition

Named Entity Recognition (NER) poses the problem of learning with multip...
research
03/21/2022

Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition

Named Entity Recognition (NER) systems often demonstrate great performan...
research
02/09/2023

Data Augmentation for Robust Character Detection in Fantasy Novels

Named Entity Recognition (NER) is a low-level task often used as a found...
research
02/14/2016

Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents

There is a wealth of information about financial systems that is embedde...

Please sign up or login with your details

Forgot password? Click here to reset