Investigating Backtranslation in Neural Machine Translation

04/17/2018
by   Alberto Poncelas, et al.
0

A prerequisite for training corpus-based machine translation (MT) systems -- either Statistical MT (SMT) or Neural MT (NMT) -- is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a high-quality NMT system. Given that large collections of new parallel text become available only quite rarely, backtranslation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus -- both as a separate standalone dataset as well as combined with human-generated parallel data -- affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

Combining SMT and NMT Back-Translated Data for Efficient NMT

Neural Machine Translation (NMT) models achieve their best performance w...
research
10/28/2021

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

Machine translation (MT) system aims to translate source language into t...
research
05/01/2020

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation

Machine translation (MT) has benefited from using synthetic training dat...
research
10/30/2021

How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus

This paper proposes a tool for efficiently constructing high-quality par...
research
09/10/2018

Multilingual Extractive Reading Comprehension by Runtime Machine Translation

Existing end-to-end neural network models for extractive Reading Compreh...
research
03/10/2021

Majority Voting with Bidirectional Pre-translation For Bitext Retrieval

Obtaining high-quality parallel corpora is of paramount importance for t...
research
06/15/2019

Tagged Back-Translation

Recent work in Neural Machine Translation (NMT) has shown significant qu...

Please sign up or login with your details

Forgot password? Click here to reset