Development of a Dataset and a Deep Learning Baseline Named Entity Recognizer for Three Low Resource Languages: Bhojpuri, Maithili and Magahi

09/14/2020
by   Rajesh Kumar Mundotiya, et al.
0

In Natural Language Processing (NLP) pipelines, Named Entity Recognition (NER) is one of the preliminary problems, which marks proper nouns and other named entities such as Location, Person, Organization, Disease etc. Such entities, without a NER module, adversely affect the performance of a machine translation system. NER helps in overcoming this problem by recognising and handling such entities separately, although it can be useful in Information Extraction systems also. Bhojpuri, Maithili and Magahi are low resource languages, usually known as Purvanchal languages. This paper focuses on the development of a NER benchmark dataset for the Machine Translation systems developed to translate from these languages to Hindi by annotating parts of their available corpora. Bhojpuri, Maithili and Magahi corpora of sizes 228373, 157468 and 56190 tokens, respectively, were annotated using 22 entity labels. The annotation considers coarse-grained annotation labels followed by the tagset used in one of the Hindi NER datasets. We also report a Deep Learning based baseline that uses an LSTM-CNNs-CRF model. The lower baseline F1-scores from the NER tool obtained by using Conditional Random Fields models are 96.73 for Bhojpuri, 93.33 for Maithili and 95.04 for Magahi. The Deep Learning-based technique (LSTM-CNNs-CRF) achieved 96.25 for Bhojpuri, 93.33 for Maithili and 95.44 for Magahi.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2022

AsNER – Annotated Dataset and Baseline for Assamese Named Entity recognition

We present the AsNER, a named entity annotation dataset for low resource...
research
04/12/2022

L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models

Named Entity Recognition (NER) is a basic NLP task and finds major appli...
research
03/16/2023

BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition

Named Entity Recognition (NER) is a fundamental task in natural language...
research
12/19/2022

MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

This paper investigates the problem of Named Entity Recognition (NER) fo...
research
08/28/2018

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

Much work in Natural Language Processing (NLP) has been for resource-ric...
research
11/25/2022

Finetuning BERT on Partially Annotated NER Corpora

Most Named Entity Recognition (NER) models operate under the assumption ...
research
04/28/2022

HiNER: A Large Hindi Named Entity Recognition Dataset

Named Entity Recognition (NER) is a foundational NLP task that aims to p...

Please sign up or login with your details

Forgot password? Click here to reset