Using Monolingual Data in Neural Machine Translation: a Systematic Study

03/27/2019
by   Franck Burlot, et al.
0

Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through back-translation - a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2016

Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions

In this paper we provide the largest published comparison of translation...
research
05/29/2019

Unsupervised Paraphrasing without Translation

Paraphrasing exemplifies the ability to abstract semantic content from s...
research
10/19/2022

A Continuum of Generation Tasks for Investigating Length Bias and Degenerate Repetition

Language models suffer from various degenerate behaviors. These differ b...
research
12/02/2019

Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition

Building conversational speech recognition systems for new languages is ...
research
06/12/2021

Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

High-performing machine translation (MT) systems can help overcome langu...
research
02/04/2022

The Ecological Footprint of Neural Machine Translation Systems

Over the past decade, deep learning (DL) has led to significant advancem...
research
03/04/2021

An empirical analysis of phrase-based and neural machine translation

Two popular types of machine translation (MT) are phrase-based and neura...

Please sign up or login with your details

Forgot password? Click here to reset