Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

04/17/2021
by   Seiichiro Kondo, et al.
0

Neural machine translation (NMT) has recently gained widespread attention because of its high translation accuracy. However, it shows poor performance in the translation of long sentences, which is a major issue in low-resource languages. It is assumed that this issue is caused by insufficient number of long sentences in the training data. Therefore, this study proposes a simple data augmentation method to handle long sentences. In this method, we use only the given parallel corpora as the training data and generate long sentences by concatenating two sentences. Based on the experimental results, we confirm improvements in long sentence translation by the proposed data augmentation method, despite its simplicity. Moreover, the translation quality is further improved by the proposed method, when combined with back-translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2019

Corpus Augmentation by Sentence Segmentation for Low-Resource Neural Machine Translation

Neural Machine Translation (NMT) has been proven to achieve impressive r...
research
08/01/2018

Low-Latency Neural Speech Translation

Through the development of neural machine translation, the quality of ma...
research
09/08/2021

Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

In the context of neural machine translation, data augmentation (DA) tec...
research
09/13/2021

Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation

Neural Machine Translation (NMT) is known to suffer from a beam-search p...
research
07/01/2021

Zero-pronoun Data Augmentation for Japanese-to-English Translation

For Japanese-to-English translation, zero pronouns in Japanese pose a ch...
research
04/05/2020

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

We explore ways of incorporating bilingual dictionaries to enable semi-s...
research
03/15/2022

Better Quality Estimation for Low Resource Corpus Mining

Quality Estimation (QE) models have the potential to change how we evalu...

Please sign up or login with your details

Forgot password? Click here to reset