Distant Supervision and Noisy Label Learning for Low Resource Named Entity Recognition: A Study on Hausa and Yorùbá

03/18/2020
by   David Ifeoluwa Adelani, et al.
0

The lack of labeled training data has limited the development of natural language processing tools, such as named entity recognition, for many languages spoken in developing countries. Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way. Additionally, to alleviate some of the negative effects of the errors in automatic annotation, noise-handling methods can be integrated. Pretrained word embeddings are another key component of most neural named entity classifiers. With the advent of more complex contextual word embeddings, an interesting trade-off between model size and performance arises. While these techniques have been shown to work well in high-resource settings, we want to study how they perform in low-resource scenarios. In this work, we perform named entity recognition for Hausa and Yorùbá, two languages that are widely spoken in several developing countries. We evaluate different embedding approaches and show that distant supervision can be successfully leveraged in a realistic low-resource scenario where it can more than double a classifier's performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Distant supervision allows obtaining labeled training corpora for low-re...
research
08/26/2021

Data Augmentation for Low-Resource Named Entity Recognition Using Backtranslation

The state of art natural language processing systems relies on sizable t...
research
10/14/2019

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

In low-resource settings, the performance of supervised labeling models ...
research
09/26/2019

On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages

Recent work has validated the importance of subword information for word...
research
11/11/2022

Gradient Imitation Reinforcement Learning for General Low-Resource Information Extraction

Information Extraction (IE) aims to extract structured information from ...
research
06/17/2021

Denoising Distantly Supervised Named Entity Recognition via a Hypergeometric Probabilistic Model

Denoising is the essential step for distant supervision based named enti...
research
07/26/2018

Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora

This study improves the performance of neural named entity recognition b...

Please sign up or login with your details

Forgot password? Click here to reset