Named Entity Recognition without Labelled Data: A Weak Supervision Approach

04/30/2020
by   Pierre Lison, et al.
16

Named Entity Recognition (NER) performance often degrades rapidly when applied to target domains that differ from the texts observed during training. When in-domain labelled data is available, transfer learning techniques can be used to adapt existing NER models to the target domain. But what should one do when there is no hand-labelled data for the target domain? This paper presents a simple but powerful approach to learn NER models in the absence of labelled data through weak supervision. The approach relies on a broad spectrum of labelling functions to automatically annotate texts from the target domain. These annotations are then merged together using a hidden Markov model which captures the varying accuracies and confusions of the labelling functions. A sequence labelling model can finally be trained on the basis of this unified annotation. We evaluate the approach on two English datasets (CoNLL 2003 and news articles from Reuters and Bloomberg) and demonstrate an improvement of about 7 percentage points in entity-level F_1 scores compared to an out-of-domain neural NER model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2018

An Instance Transfer based Approach Using Enhanced Recurrent Neural Network for Domain Named Entity Recognition

Recently, neural networks have shown promising results for named entity ...
research
02/22/2023

FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model

The development of annotated datasets over the 21st century has helped u...
research
12/05/2022

Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Named Entity Recognition (NER) involves the identification and classific...
research
11/25/2022

Finetuning BERT on Partially Annotated NER Corpora

Most Named Entity Recognition (NER) models operate under the assumption ...
research
08/07/2022

SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence Labeling

Weak labeling is a popular weak supervision strategy for Named Entity Re...
research
04/19/2021

skweak: Weak Supervision Made Easy for NLP

We present skweak, a versatile, Python-based software toolkit enabling N...
research
03/07/2023

Disambiguation of Company names via Deep Recurrent Networks

Name Entity Disambiguation is the Natural Language Processing task of id...

Please sign up or login with your details

Forgot password? Click here to reset