Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

08/31/2018
by   Wei Wang, et al.
0

Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data for domain MT and applies them to denoising NMT training. The proposed approach uses trusted data and a denoising curriculum realized by online data selection. Intrinsic and extrinsic evaluations of the approach show its significant effectiveness for NMT to train on data with severe noise.

READ FULL TEXT
research
01/13/2023

Prompting Neural Machine Translation with Translation Memories

Improving machine translation (MT) systems with translation memories (TM...
research
07/02/2019

Improving Robustness in Real-World Neural Machine Translation Engines

As a commercial provider of machine translation, we are constantly train...
research
08/28/2019

Learning a Multitask Curriculum for Neural Machine Translation

Existing curriculum learning research in neural machine translation (NMT...
research
09/09/2019

Combining SMT and NMT Back-Translated Data for Efficient NMT

Neural Machine Translation (NMT) models achieve their best performance w...
research
02/04/2022

The Ecological Footprint of Neural Machine Translation Systems

Over the past decade, deep learning (DL) has led to significant advancem...
research
10/30/2019

Ordering Matters: Word Ordering Aware Unsupervised NMT

Denoising-based Unsupervised Neural Machine Translation (U-NMT) models t...
research
05/20/2022

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

Traditional machine translation (MT) metrics provide an average measure ...

Please sign up or login with your details

Forgot password? Click here to reset