The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

08/31/2023
by   Lucas Bandarkar, et al.
0

We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2023

YORC: Yoruba Reading Comprehension dataset

In this paper, we create YORC: a new multi-choice Yoruba Reading Compreh...
research
07/11/2021

Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking

Extractive Reading Comprehension (ERC) has made tremendous advances enab...
research
05/12/2023

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Language models (LMs) are powerful tools for natural language processing...
research
10/11/2022

Are Pretrained Multilingual Models Equally Fair Across Languages?

Pretrained multilingual language models can help bridge the digital lang...
research
06/08/2023

M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Despite the existence of various benchmarks for evaluating natural langu...
research
05/15/2023

What's the Meaning of Superhuman Performance in Today's NLU?

In the last five years, there has been a significant focus in Natural La...
research
09/13/2021

Wine is Not v i n. – On the Compatibility of Tokenizations Across Languages

The size of the vocabulary is a central design choice in large pretraine...

Please sign up or login with your details

Forgot password? Click here to reset