Congolese Swahili Machine Translation for Humanitarian Response

03/19/2021
by   Alp Öktem, et al.
0

In this paper we describe our efforts to make a bidirectional Congolese Swahili (SWC) to French (FRA) neural machine translation system with the motivation of improving humanitarian translation workflows. For training, we created a 25,302-sentence general domain parallel corpus and combined it with publicly available data. Experimenting with low-resource methodologies like cross-dialect transfer and semi-supervised learning, we recorded improvements of up to 2.4 and 3.5 BLEU points in the SWC-FRA and FRA-SWC directions, respectively. We performed human evaluations to assess the usability of our models in a COVID-domain chatbot that operates in the Democratic Republic of Congo (DRC). Direct assessment in the SWC-FRA direction demonstrated an average quality ranking of 6.3 out of 10 with 75 main message of the source text. For the FRA-SWC direction, our preliminary tests on post-editing assessment showed its potential usefulness for machine-assisted translation. We make our models, datasets containing up to 1 million sentences, our development pipeline, and a translator web-app available for public use.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2019

Application of Low-resource Machine Translation Techniques to Russian-Tatar Language Pair

Neural machine translation is the current state-of-the-art in machine tr...
research
09/20/2020

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

Despite being the seventh most widely spoken language in the world, Beng...
research
09/26/2017

Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

In machine translation, we often try to collect resources to improve its...
research
03/09/2020

Tigrinya Neural Machine Translation with Transfer Learning for Humanitarian Response

We report our experiments in building a domain-specific Tigrinya-to-Engl...
research
04/05/2020

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

We explore ways of incorporating bilingual dictionaries to enable semi-s...
research
04/02/2023

Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages

The advent of deep learning has led to a significant gain in machine tra...
research
11/27/2019

Jejueo Datasets for Machine Translation and Speech Synthesis

Jejueo was classified as critically endangered by UNESCO in 2010. Althou...

Please sign up or login with your details

Forgot password? Click here to reset