DeepAI AI Chat
Log In Sign Up

MorisienMT: A Dataset for Mauritian Creole Machine Translation

by   Raj Dabre, et al.

In this paper, we describe MorisienMT, a dataset for benchmarking machine translation quality of Mauritian Creole. Mauritian Creole (Morisien) is the lingua franca of the Republic of Mauritius and is a French-based creole language. MorisienMT consists of a parallel corpus between English and Morisien, French and Morisien and a monolingual corpus for Morisien. We first give an overview of Morisien and then describe the steps taken to create the corpora and, from it, the training and evaluation splits. Thereafter, we establish a variety of baseline models using the created parallel corpora as well as large French–English corpora for transfer learning. We release our datasets publicly for research purposes and hope that this spurs research for Morisien machine translation.


page 1

page 2

page 3

page 4


Extended Parallel Corpus for Amharic-English Machine Translation

This paper describes the acquisition, preprocessing, segmentation, and a...

An Evaluation of Persian-English Machine Translation Datasets with Transformers

Nowadays, many researchers are focusing their attention on the subject o...

Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation

Lectures translation is a case of spoken language translation and there ...

Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

Sanskrit is a low-resource language with a rich heritage. Digitized Sans...

A Parallel Corpus of Translationese

We describe a set of bilingual English--French and English--German paral...

Designing the Business Conversation Corpus

While the progress of machine translation of written text has come far i...

Neural machine translation, corpus and frugality

In machine translation field, in both academia and industry, there is a ...