Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

by   Wilhelmina Nekoto, et al.

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under


page 1

page 2

page 3

page 4


Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages

Over the years, there have been campaigns to include the African languag...

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

As machine translation (MT) systems progress at a rapid pace, questions ...

Variance-Aware Machine Translation Test Sets

We release 70 small and discriminative test sets for machine translation...

In Search of Lost Edges: A Case Study on Reconstructing Financial Networks

To capture the systemic complexity of international financial systems, n...

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Morphologically-rich polysynthetic languages present a challenge for NLP...

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machi...

Machine Translation Robustness to Natural Asemantic Variation

We introduce and formalize an under-studied linguistic phenomenon we cal...