Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

06/16/2021
by   Haoming Jiang, et al.
17

Weak supervision has shown promising results in many natural language processing tasks, such as Named Entity Recognition (NER). Existing work mainly focuses on learning deep NER models only with weak supervision, i.e., without any human annotation, and shows that by merely using weakly labeled data, one can achieve good performance, though still underperforms fully supervised NER with manually/strongly labeled data. In this paper, we consider a more practical scenario, where we have both a small amount of strongly labeled data and a large amount of weakly labeled data. Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data. To address this issue, we propose a new multi-stage computational framework – NEEDLE with three essential ingredients: (1) weak label completion, (2) noise-aware loss function, and (3) final fine-tuning over the strongly labeled data. Through experiments on E-commerce query NER and Biomedical NER, we demonstrate that NEEDLE can effectively suppress the noise of the weak labels and outperforms existing methods. In particular, we achieve new SOTA F1-scores on 3 Biomedical NER datasets: BC5CDR-chem 93.74, BC5CDR-disease 90.69, NCBI-disease 92.28.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

04/20/2017

SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data

We present SwellShark, a framework for building biomedical named entity ...
08/19/2021

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction

We study the problem of query attribute value extraction, which aims to ...
08/05/2020

Trove: Ontology-driven weak supervision for medical entity classification

Motivation: Recognizing named entities (NER) and their associated attrib...
01/20/2022

Predictive Inference with Weak Supervision

The expense of acquiring labels in large-scale statistical machine learn...
04/13/2021

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Instead of using expensive manual annotations, researchers have proposed...
03/20/2020

FedNER: Medical Named Entity Recognition with Federated Learning

Medical named entity recognition (NER) has wide applications in intellig...
03/20/2020

FedNER: Privacy-preserving Medical Named Entity Recognition with Federated Learning

Medical named entity recognition (NER) has wide applications in intellig...

Code Repositories

amazon-weak-ner-needle

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.