A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

08/23/2019
by   Aditi Chaudhary, et al.
0

Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lower-resourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dual-strategy approach best, starting with a cross-lingual transferred model, then performing targeted annotation of only uncertain entity spans in the target language, minimizing annotator effort. Results demonstrate that cross-lingual transfer is a powerful tool when very little data can be annotated, but an entity-targeted annotation strategy can achieve competitive accuracy quickly, with just one-tenth of training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

Building named entity recognition (NER) models for languages that do not...
research
03/05/2020

Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish

Named Entity Recognition (NER) has greatly advanced by the introduction ...
research
07/08/2017

Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection

The state-of-the-art named entity recognition (NER) systems are supervis...
research
08/31/2019

Entity Projection via Machine-Translation for Cross-Lingual NER

Although over 100 languages are supported by strong off-the-shelf machin...
research
06/17/2020

Building Low-Resource NER Models Using Non-Speaker Annotation

In low-resource natural language processing (NLP), the key problem is a ...
research
07/15/2020

UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data

Prior works in cross-lingual named entity recognition (NER) with no/litt...
research
11/14/2019

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

For languages with no annotated resources, transferring knowledge from r...

Please sign up or login with your details

Forgot password? Click here to reset