DeepAI AI Chat
Log In Sign Up

Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping

by   Joel Mathew, et al.

We present a weakly-supervised data augmentation approach to improve Named Entity Recognition (NER) in a challenging domain: extracting biomedical entities (e.g., proteins) from the scientific literature. First, we train a neural NER (NNER) model over a small seed of fully-labeled examples. Second, we use a reference set of entity names (e.g., proteins in UniProt) to identify entity mentions with high precision, but low recall, on an unlabeled corpus. Third, we use the NNER model to assign weak labels to the corpus. Finally, we retrain our NNER model iteratively over the augmented training set, including the seed, the reference-set examples, and the weakly-labeled examples, which improves model performance. We show empirically that this augmented bootstrapping process significantly improves NER performance, and discuss the factors impacting the efficacy of the approach.


page 1

page 2

page 3

page 4


Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision

We created this CORD-19-NER dataset with comprehensive named entity reco...

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Instead of using expensive manual annotations, researchers have proposed...

Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition

Named Entity Recognition (NER) systems often demonstrate great performan...

Using Unlabeled Texts for Named-Entity Recognition

Named Entity Recognition (NER) poses the problem of learning with multip...

Data Augmentation for Robust Character Detection in Fantasy Novels

Named Entity Recognition (NER) is a low-level task often used as a found...

A Byte-sized Approach to Named Entity Recognition

In biomedical literature, it is common for entity boundaries to not alig...

Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents

There is a wealth of information about financial systems that is embedde...