GlobalBench: A Benchmark for Global Progress in Natural Language Processing

05/24/2023
by   Yueqi Song, et al.
0

Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist. Arguably, these are due to uneven resource allocation and sub-optimal incentives to work on less resourced languages. To track and further incentivize the global development of equitable language technology, we introduce GlobalBench. Prior multilingual benchmarks are static and have focused on a limited number of tasks and languages. In contrast, GlobalBench is an ever-expanding collection that aims to dynamically track progress on all NLP datasets in all languages. Rather than solely measuring accuracy, GlobalBench also tracks the estimated per-speaker utility and equity of technology across all languages, providing a multi-faceted view of how language technology is serving people of the world. Furthermore, GlobalBench is designed to identify the most under-served languages, and rewards research efforts directed towards those languages. At present, the most under-served languages are the ones with a relatively high population, but nonetheless overlooked by composite multilingual benchmarks (like Punjabi, Portuguese, and Wu Chinese). Currently, GlobalBench covers 966 datasets in 190 languages, and has 1,128 system submissions spanning 62 languages.

READ FULL TEXT

page 2

page 4

research
05/31/2022

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Natural language processing (NLP) has a significant impact on society vi...
research
05/20/2023

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

The NLP community has mainly focused on scaling Large Language Models (L...
research
06/01/2022

What a Creole Wants, What a Creole Needs

In recent years, the natural language processing (NLP) community has giv...
research
10/13/2021

Systematic Inequalities in Language Technology Performance across the World's Languages

Natural language processing (NLP) systems have become a central technolo...
research
12/19/2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

We present NusaCrowd, a collaborative initiative to collect and unite ex...
research
03/17/2022

Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation

The performance of multilingual pretrained models is highly dependent on...
research
10/16/2022

Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World

Linguistic disparity in the NLP world is a problem that has been widely ...

Please sign up or login with your details

Forgot password? Click here to reset