Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

06/12/2023
by   Catherine Gitau, et al.
0

In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. There has been recent interest in investigating approaches for training systems for languages with limited resources and one popular approach is the use of data augmentation techniques. Data augmentation aims to increase the quantity of data that is available to train the system. In machine translation, majority of the language pairs around the world are considered low resource because they have little parallel data available and the quality of neural machine translation (NMT) systems depend a lot on the availability of sizable parallel corpora. We study and apply three simple data augmentation techniques popularly used in text classification tasks; synonym replacement, random insertion and contextual data augmentation and compare their performance with baseline neural machine translation for English-Swahili (En-Sw) datasets. We also present results in BLEU, ChrF and Meteor scores. Overall, the contextual data augmentation technique shows some improvements both in the EN → SW and SW → EN directions. We see that there is potential to use these methods in neural machine translation when more extensive experiments are done with diverse datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2017

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially...
research
03/27/2023

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Neural machine translation (NMT) has progressed rapidly over the past se...
research
05/18/2022

Data Augmentation to Address Out-of-Vocabulary Problem in Low-Resource Sinhala-English Neural Machine Translation

Out-of-Vocabulary (OOV) is a problem for Neural Machine Translation (NMT...
research
05/16/2023

Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs

This paper explores the use of text data augmentation techniques to enha...
research
06/10/2019

Generalized Data Augmentation for Low-Resource Translation

Translation to or from low-resource languages LRLs poses challenges for ...
research
07/13/2023

Data Augmentation for Machine Translation via Dependency Subtree Swapping

We present a generic framework for data augmentation via dependency subt...
research
01/14/2022

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

Neural models trained with large amount of parallel data have achieved i...

Please sign up or login with your details

Forgot password? Click here to reset