Ngambay-French Neural Machine Translation (sba-Fr)

08/25/2023
by   Sakayo Toadoum Sari, et al.
0

In Africa, and the world at large, there is an increasing focus on developing Neural Machine Translation (NMT) systems to overcome language barriers. NMT for Low-resource language is particularly compelling as it involves learning with limited labelled data. However, obtaining a well-aligned parallel corpus for low-resource languages can be challenging. The disparity between the technological advancement of a few global languages and the lack of research on NMT for local languages in Chad is striking. End-to-end NMT trials on low-resource Chad languages have not been attempted. Additionally, there is a dearth of online and well-structured data gathering for research in Natural Language Processing, unlike some African languages. However, a guided approach for data gathering can produce bitext data for many Chadian language translation pairs with well-known languages that have ample data. In this project, we created the first sba-Fr Dataset, which is a corpus of Ngambay-to-French translations, and fine-tuned three pre-trained models using this dataset. Our experiments show that the M2M100 model outperforms other models with high BLEU scores on both original and original+synthetic data. The publicly available bitext dataset can be used for research purposes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2023

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Efficiently and accurately translating a corpus into a low-resource lang...
research
02/17/2022

Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag

The performance of Neural Machine Translation (NMT) depends significantl...
research
11/02/2018

Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation

We aim to better exploit the limited amounts of parallel text available ...
research
08/11/2020

Revisiting Low Resource Status of Indian Languages in Machine Translation

Indian language machine translation performance is hampered due to the l...
research
07/13/2021

On the Difficulty of Translating Free-Order Case-Marking Languages

Identifying factors that make certain languages harder to model than oth...
research
05/23/2023

LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages

Knowing the language of an input text/audio is a necessary first step fo...
research
05/15/2023

Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector

The progress of Natural Language Processing (NLP), although fast in rece...

Please sign up or login with your details

Forgot password? Click here to reset