MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

12/19/2022
by   Shashank Sonkar, et al.
0

This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as English and French with abundant training data. However, NER for low-resource languages remains relatively unexplored. In this paper, we introduce Mask Augmented Named Entity Recognition (MANER), a new methodology that leverages the distributional hypothesis of pre-trained masked language models (MLMs) for NER. The <mask> token in pre-trained MLMs encodes valuable semantic contextual information. MANER re-purposes the <mask> token for NER prediction. Specifically, we prepend the <mask> token to every word in a sentence for which we would like to predict the named entity tag. During training, we jointly fine-tune the MLM and a new NER prediction head attached to each <mask> token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on state-of-the-art methods by up to 48 and by 12 ablation studies to understand the scenarios that are best-suited to MANER.

READ FULL TEXT
research
08/09/2022

Effects of Annotations' Density on Named Entity Recognition Models' Performance in the Context of African Languages

African languages have recently been the subject of several studies in N...
research
06/04/2019

Back Attention Knowledge Transfer for Low-resource Named Entity Recognition

In recent years, great success has been achieved in the field of natural...
research
04/25/2022

Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting

Self-augmentation has received increasing research interest recently to ...
research
09/03/2021

An Open-Source Dataset and A Multi-Task Model for Malay Named Entity Recognition

Named entity recognition (NER) is a fundamental task of natural language...
research
06/10/2023

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Named Entity Recognition (NER) is a fundamental task in NLP that is used...
research
03/23/2021

TMR: Evaluating NER Recall on Tough Mentions

We propose the Tough Mentions Recall (TMR) metrics to supplement traditi...

Please sign up or login with your details

Forgot password? Click here to reset