Using Multiple Subwords to Improve English-Esperanto Automated Literary Translation Quality

11/28/2020
by   Alberto Poncelas, et al.
0

Building Machine Translation (MT) systems for low-resource languages remains challenging. For many language pairs, parallel data are not widely available, and in such cases MT models do not achieve results comparable to those seen with high-resource languages. When data are scarce, it is of paramount importance to make optimal use of the limited material available. To that end, in this paper we propose employing the same parallel sentences multiple times, only changing the way the words are split each time. For this purpose we use several Byte Pair Encoding models, with various merge operations used in their configuration. In our experiments, we use this technique to expand the available data and improve an MT system involving a low-resource language pair, namely English-Esperanto. As an additional contribution, we made available a set of English-Esperanto parallel data in the literary domain.

READ FULL TEXT
research
03/31/2020

Evaluating Amharic Machine Translation

Machine translation (MT) systems are now able to provide very accurate r...
research
03/31/2021

Domain-specific MT for Low-resource Languages: The case of Bambara-French

Translating to and from low-resource languages is a challenge for machin...
research
04/05/2018

Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts

Although there are increasing and significant ties between China and Por...
research
02/04/2019

Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

The vast majority of language pairs in the world are low-resource becaus...
research
11/06/2018

Off-the-Shelf Unsupervised NMT

We frame unsupervised machine translation (MT) in the context of multi-t...
research
06/09/2020

An Augmented Translation Technique for low Resource language pair: Sanskrit to Hindi translation

Neural Machine Translation (NMT) is an ongoing technique for Machine Tra...
research
05/24/2019

A Call for Prudent Choice of Subword Merge Operations

Most neural machine translation systems are built upon subword units ext...

Please sign up or login with your details

Forgot password? Click here to reset