L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models

04/12/2022
by   Parth Patil, et al.
0

Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems. It helps us identify key entities in a sentence used for the downstream application. NER or similar slot filling systems for popular languages have been heavily used in commercial applications. In this work, we focus on Marathi, an Indian language, spoken prominently by the people of Maharashtra state. Marathi is a low resource language and still lacks useful NER resources. We present L3Cube-MahaNER, the first major gold standard named entity recognition dataset in Marathi. We also describe the manual annotation guidelines followed during the process. In the end, we benchmark the dataset on different CNN, LSTM, and Transformer based models like mBERT, XLM-RoBERTa, IndicBERT, MahaBERT, etc. The MahaBERT provides the best performance among all the models. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

Named entity recognition (NER) is the process of recognising and classif...
research
07/17/2017

Neural Reranking for Named Entity Recognition

We propose a neural reranking system for named entity recognition (NER)....
research
04/28/2022

HiNER: A Large Hindi Named Entity Recognition Dataset

Named Entity Recognition (NER) is a foundational NLP task that aims to p...
research
01/02/2021

A Robust and Domain-Adaptive Approach for Low-Resource Named Entity Recognition

Recently, it has attracted much attention to build reliable named entity...
research
09/29/2022

Named Entity Recognition in Industrial Tables using Tabular Language Models

Specialized transformer-based models for encoding tabular data have gain...
research
04/11/2023

Exploring the Use of Foundation Models for Named Entity Recognition and Lemmatization Tasks in Slavic Languages

This paper describes Adam Mickiewicz University's (AMU) solution for the...

Please sign up or login with your details

Forgot password? Click here to reset