L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages

11/21/2022
by   Raviraj Joshi, et al.
0

The monolingual Hindi BERT models currently available on the model hub do not perform better than the multi-lingual models on downstream tasks. We present L3Cube-HindBERT, a Hindi BERT model pre-trained on Hindi monolingual corpus. Further, since Indic languages, Hindi and Marathi share the Devanagari script, we train a single model for both languages. We release DevBERT, a Devanagari BERT model trained on both Marathi and Hindi monolingual datasets. We evaluate these models on downstream Hindi and Marathi text classification and named entity recognition tasks. The HindBERT and DevBERT-based models show superior performance compared to their multi-lingual counterparts. These models are shared at https://huggingface.co/l3cube-pune .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

Transformers are the most eminent architectures used for a vast range of...
research
05/21/2022

Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese

Multilingual language models such as mBERT have seen impressive cross-li...
research
03/25/2021

Bertinho: Galician BERT Representations

This paper presents a monolingual BERT model for Galician. We follow the...
research
12/03/2020

GottBERT: a pure German Language Model

Lately, pre-trained language models advanced the field of natural langua...
research
06/02/2023

Data-Efficient French Language Modeling with CamemBERTa

Recent advances in NLP have significantly improved the performance of la...
research
01/28/2022

Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages

Parallel corpora are ideal for extracting a multilingual named entity (M...
research
05/22/2023

DUMB: A Benchmark for Smart Evaluation of Dutch Models

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a d...

Please sign up or login with your details

Forgot password? Click here to reset