Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation

05/31/2021
by   Eleftheria Briakou, et al.
0

While it has been shown that Neural Machine Translation (NMT) is highly sensitive to noisy parallel training samples, prior work treats all types of mismatches between source and target as noise. As a result, it remains unclear how samples that are mostly equivalent but contain a small number of semantically divergent tokens impact NMT training. To close this gap, we analyze the impact of different types of fine-grained semantic divergences on Transformer models. We show that models trained on synthetic divergences output degenerated text more frequently and are less confident in their predictions. Based on these findings, we introduce a divergent-aware NMT framework that uses factors to help NMT recover from the degradation caused by naturally occurring divergences, improving both translation quality and model calibration on EN-FR tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2020

On the Inference Calibration of Neural Machine Translation

Confidence calibration, which aims to make model predictions equal to th...
research
05/31/2018

On the Impact of Various Types of Noise on Neural Machine Translation

We examine how various types of noise in the parallel training data impa...
research
10/12/2021

Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation

Neural Machine Translation (NMT) models are known to suffer from noisy i...
research
10/07/2019

Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back Translation

Neural Machine Translation (NMT) models have been proved strong when tra...
research
10/21/2020

Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation

In Neural Machine Translation (and, more generally, conditional language...
research
04/05/2020

Detecting and Understanding Generalization Barriers for Neural Machine Translation

Generalization to unseen instances is our eternal pursuit for all data-d...
research
05/24/2023

Towards Fine-Grained Localization of Privacy Behaviors

Mobile applications are required to give privacy notices to users when t...

Please sign up or login with your details

Forgot password? Click here to reset