IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages

12/11/2022
by   Sumanth Doddapaneni, et al.
0

In this work, we introduce IndicXTREME, a benchmark consisting of nine diverse tasks covering 18 languages from the Indic sub-continent belonging to four different families. Across languages and tasks, IndicXTREME contains a total of 103 evaluation sets, of which 51 are new contributions to the literature. To maintain high quality, we only use human annotators to curate or translate our datasets. To the best of our knowledge, this is the first effort toward creating a standard benchmark for Indic languages that aims to test the zero-shot capabilities of pretrained language models. We also release IndicCorp v2, an updated and much larger version of IndicCorp that contains 20.9 billion tokens in 24 languages. We pretrain IndicBERT v2 on IndicCorp v2 and evaluate it on IndicXTREME to show that it outperforms existing multilingual language models such as XLM-R and MuRIL.

READ FULL TEXT

page 1

page 6

page 17

research
12/21/2022

SERENGETI: Massively Multilingual Language Models for Africa

Multilingual language models (MLMs) acquire valuable, generalizable ling...
research
05/19/2023

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Data scarcity is a crucial issue for the development of highly multiling...
research
07/01/2021

A Primer on Pretrained Multilingual Language Models

Multilingual Language Models (MLLMs) such as mBERT, XLM, XLM-R, etc. hav...
research
05/19/2023

Evaluating task understanding through multilingual consistency: A ChatGPT case study

At the staggering pace with which the capabilities of large language mod...
research
05/25/2023

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

India has a rich linguistic landscape with languages from 4 major langua...
research
10/13/2020

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Language models (LMs) have proven surprisingly successful at capturing f...
research
09/13/2021

On Language Models for Creoles

Creole languages such as Nigerian Pidgin English and Haitian Creole are ...

Please sign up or login with your details

Forgot password? Click here to reset