Towards Bridging the Digital Language Divide

07/25/2023
by   Gábor Bella, et al.
0

It is a well-known fact that current AI-based language technology – language models, machine translation systems, multilingual dictionaries and corpora – focuses on the world's 2-3 efforts have attempted to expand the coverage of AI technology to `under-resourced languages.' The goal of our paper is to bring attention to a phenomenon that we call linguistic bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. Linguistic bias is manifested in uneven per-language performance even in the case of similar test conditions. We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented, and that can even become ethically problematic as they disregard valuable aspects of diversity as well as the needs of the language communities themselves. As our attempt at building diversity-aware language resources, we present a new initiative that aims at reducing linguistic bias through both technological design and methodology, based on an eye-level collaboration with local communities.

READ FULL TEXT
research
07/25/2023

Diversity and Language Technology: How Techno-Linguistic Bias Can Cause Epistemic Injustice

It is well known that AI-based language technology – large language mode...
research
06/12/2023

Lost in Translation: Large Language Models in Non-English Content Analysis

In recent years, large language models (e.g., Open AI's GPT-4, Meta's LL...
research
10/11/2022

Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models

While multilingual language models can improve NLP performance on low-re...
research
10/14/2021

Designing Language Technologies for Social Good: The Road not Taken

Development of speech and language technology for social good (LT4SG), e...
research
04/19/2023

Revitalizing Endangered Languages: AI-powered language learning as a catalyst for language appreciation

According to UNESCO, there are nearly 7,000 languages spoken worldwide, ...
research
02/24/2023

Spanish Built Factual Freectianary (Spanish-BFF): the first AI-generated free dictionary

Dictionaries are one of the oldest and most used linguistic resources. B...
research
01/03/2023

Average Is Not Enough: Caveats of Multilingual Evaluation

This position paper discusses the problem of multilingual evaluation. Us...

Please sign up or login with your details

Forgot password? Click here to reset