Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

10/05/2020
by   Wilhelmina Nekoto, et al.
0

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/03/2020

Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages

Over the years, there have been campaigns to include the African languag...
10/12/2020

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

As machine translation (MT) systems progress at a rapid pace, questions ...
11/07/2021

Variance-Aware Machine Translation Test Sets

We release 70 small and discriminative test sets for machine translation...
09/03/2019

In Search of Lost Edges: A Case Study on Reconstructing Financial Networks

To capture the systemic complexity of international financial systems, n...
03/16/2022

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Morphologically-rich polysynthetic languages present a challenge for NLP...
05/09/2022

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machi...
05/25/2022

Machine Translation Robustness to Natural Asemantic Variation

We introduce and formalize an under-studied linguistic phenomenon we cal...