Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

03/24/2022
by   Onkar Litake, et al.
0

Named entity recognition (NER) is the process of recognising and classifying important information (entities) in text. Proper nouns, such as a person's name, an organization's name, or a location's name, are examples of entities. The NER is one of the important modules in applications like human resources, customer support, search engines, content classification, and academia. In this work, we consider NER for low-resource Indian languages like Hindi and Marathi. The transformer-based models have been widely used for NER tasks. We consider different variations of BERT like base-BERT, RoBERTa, and AlBERT and benchmark them on publicly available Hindi and Marathi NER datasets. We provide an exhaustive comparison of different monolingual and multilingual transformer-based models and establish simple baselines currently missing in the literature. We show that the monolingual MahaRoBERTa model performs the best for Marathi NER whereas the multilingual XLM-RoBERTa performs the best for Hindi NER. We also perform cross-language evaluation and present mixed observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2023

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Named Entity Recognition (NER) is a fundamental task in NLP that is used...
research
04/12/2022

L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models

Named Entity Recognition (NER) is a basic NLP task and finds major appli...
research
05/02/2020

Sources of Transfer in Multilingual Named Entity Recognition

Named-entities are inherently multilingual, and annotations in any given...
research
09/29/2022

Named Entity Recognition in Industrial Tables using Tabular Language Models

Specialized transformer-based models for encoding tabular data have gain...
research
04/08/2022

CyNER: A Python Library for Cybersecurity Named Entity Recognition

Open Cyber threat intelligence (OpenCTI) information is available in an ...
research
08/30/2022

MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity Recognition

We present MultiCoNER, a large multilingual dataset for Named Entity Rec...
research
03/25/2021

Bertinho: Galician BERT Representations

This paper presents a monolingual BERT model for Galician. We follow the...

Please sign up or login with your details

Forgot password? Click here to reset