Building a Parallel Corpus and Training Translation Models Between Luganda and English

01/07/2023
by   Richard Kimera, et al.
0

Neural machine translation (NMT) has achieved great successes with large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even 'Google translate' does not serve Luganda at the time of this writing. In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. Then, we train NMT models with hyper-parameter search on the dataset. Experiments gave us a BLEU score of 21.28 from Luganda to English and 17.47 from English to Luganda. Some translation examples show high quality of the translation. We believe that our model is the first Luganda-English NMT model. The bilingual dataset we built will be available to the public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

Simultaneous Multi-Pivot Neural Machine Translation

Parallel corpora are indispensable for training neural machine translati...
research
10/30/2021

How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus

This paper proposes a tool for efficiently constructing high-quality par...
research
09/09/2021

A Large-Scale Study of Machine Translation in the Turkic Languages

Recent advances in neural machine translation (NMT) have pushed the qual...
research
09/10/2018

Multilingual Extractive Reading Comprehension by Runtime Machine Translation

Existing end-to-end neural network models for extractive Reading Compreh...
research
10/17/2020

A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences

Multimodal neural machine translation (NMT) has become an increasingly i...
research
02/25/2020

MuST-Cinema: a Speech-to-Subtitles corpus

Growing needs in localising audiovisual content in multiple languages th...
research
03/10/2021

Majority Voting with Bidirectional Pre-translation For Bitext Retrieval

Obtaining high-quality parallel corpora is of paramount importance for t...

Please sign up or login with your details

Forgot password? Click here to reset