Robust Named Entity Recognition with Truecasing Pretraining

12/15/2019
by   Stephen Mayhew, et al.
0

Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data. In particular, capitalization is a strong signal for entities in many languages, and even state of the art models overfit to this feature, with drastically lower performance on uncapitalized text. In this work, we address the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. The pretrained truecaser is combined with a standard BiLSTM-CRF model for NER by appending output distributions to character embeddings. In experiments over several datasets of varying domain and casing quality, we show that our new model improves performance in uncased text, even adding value to uncased BERT embeddings. Our method achieves a new state of the art on the WNUT17 shared task dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2019

Error Analysis for Vietnamese Named Entity Recognition on Deep Neural Network Models

In recent years, Vietnamese Named Entity Recognition (NER) systems have ...
research
06/28/2017

Named Entity Disambiguation for Noisy Text

We address the task of Named Entity Disambiguation (NED) for noisy text....
research
08/13/2019

BioFLAIR: Pretrained Pooled Contextualized Embeddings for Biomedical Sequence Labeling Tasks

Biomedical Named Entity Recognition (NER) is a challenging problem in bi...
research
07/24/2021

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

In this work, we examine the ability of NER models to use contextual inf...
research
08/05/2022

A Noise-Robust Loss for Unlabeled Entity Problem in Named Entity Recognition

Named Entity Recognition (NER) is an important task in natural language ...
research
02/21/2019

Pretrained language model transfer on neural named entity recognition in Indonesian conversational texts

Named entity recognition (NER) is an important task in NLP, which is all...
research
04/24/2015

Learning Dictionaries for Named Entity Recognition using Minimal Supervision

This paper describes an approach for automatic construction of dictionar...

Please sign up or login with your details

Forgot password? Click here to reset