Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

10/05/2020
by   Wilhelmina Nekoto, et al.
0

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages

Over the years, there have been campaigns to include the African languag...
research
10/12/2020

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

As machine translation (MT) systems progress at a rapid pace, questions ...
research
06/11/2023

Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction

Neural models have drastically advanced state of the art for machine tra...
research
09/03/2019

In Search of Lost Edges: A Case Study on Reconstructing Financial Networks

To capture the systemic complexity of international financial systems, n...
research
07/12/2023

Peru Mining: Analysis and Forecast of Mining Production in Peru Using Time Series and Data Science Techniques

Peruvian mining plays a crucial role in the country's economy, being one...
research
09/07/2022

Facilitating Global Team Meetings Between Language-Based Subgroups: When and How Can Machine Translation Help?

Global teams frequently consist of language-based subgroups who put toge...
research
10/12/2022

SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment

Word alignments are essential for a variety of NLP tasks. Therefore, cho...

Please sign up or login with your details

Forgot password? Click here to reset