Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World

10/16/2022
by   Surangika Ranathunga, et al.
0

Linguistic disparity in the NLP world is a problem that has been widely acknowledged recently. However, different facets of this problem, or the reasons behind this disparity are seldom discussed within the NLP community. This paper provides a comprehensive analysis of the disparity that exists within the languages of the world. We show that simply categorising languages considering data availability may not be always correct. Using an existing language categorisation based on speaker population and vitality, we analyse the distribution of language data resources, amount of NLP/CL research, inclusion in multilingual web-based platforms and the inclusion in pre-trained multilingual models. We show that many languages do not get covered in these resources or platforms, and even within the languages belonging to the same language group, there is wide disparity. We analyse the impact of family, geographical location, GDP and the speaker population of languages and provide possible reasons for this disparity, along with some suggestions to overcome the same.

READ FULL TEXT

page 7

page 15

page 19

page 21

page 22

page 23

page 24

page 25

research
04/20/2020

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Language technologies contribute to promoting multilingualism and lingui...
research
11/28/2022

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

While the NLP community is generally aware of resource disparities among...
research
10/11/2022

Are Pretrained Multilingual Models Equally Fair Across Languages?

Pretrained multilingual language models can help bridge the digital lang...
research
05/12/2022

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages

Although recent Massively Multilingual Language Models (MMLMs) like mBER...
research
05/24/2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing

Despite the major advances in NLP, significant disparities in NLP system...
research
05/28/2021

Bhāx1E63ācitra: Visualising the dialect geography of South Asia

We present Bhāx1E63ācitra, a dialect mapping system for South Asia built...
research
05/25/2022

Evaluating Inclusivity, Equity, and Accessibility of NLP Technology: A Case Study for Indian Languages

In order for NLP technology to be widely applicable and useful, it needs...

Please sign up or login with your details

Forgot password? Click here to reset