Dual Skew Divergence Loss for Neural Machine Translation

08/22/2019
by   Fengshun Xiao, et al.
0

For neural sequence model training, maximum likelihood (ML) has been commonly adopted to optimize model parameters with respect to the corresponding objective. However, in the case of sequence prediction tasks like neural machine translation (NMT), training with the ML-based cross entropy loss would often lead to models that overgeneralize and plunge into local optima. In this paper, we propose an extended loss function called dual skew divergence (DSD), which aims to give a better tradeoff between generalization ability and error avoidance during NMT training. Our empirical study indicates that switching to DSD loss after the convergence of ML training helps the model skip the local optimum and stimulates a stable performance improvement. The evaluations on WMT 2014 English-German and English-French translation tasks demonstrate that the proposed loss indeed helps bring about better translation performance than several baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
07/26/2021

Revisiting Negation in Neural Machine Translation

In this paper, we evaluate the translation of negation both automaticall...
research
06/30/2017

Neural Sequence Model Training via α-divergence Minimization

We propose a new neural sequence model training method in which the obje...
research
08/13/2018

Regularizing Neural Machine Translation by Target-bidirectional Agreement

Although Neural Machine Translation (NMT) has achieved remarkable progre...
research
10/07/2020

Dual Reconstruction: a Unifying Objective for Semi-Supervised Neural Machine Translation

While Iterative Back-Translation and Dual Learning effectively incorpora...
research
11/01/2016

Dual Learning for Machine Translation

While neural machine translation (NMT) is making good progress in the pa...
research
06/15/2021

Sequence-Level Training for Non-Autoregressive Neural Machine Translation

In recent years, Neural Machine Translation (NMT) has achieved notable r...
research
10/04/2018

AutoLoss: Learning Discrete Schedules for Alternate Optimization

Many machine learning problems involve iteratively and alternately optim...

Please sign up or login with your details

Forgot password? Click here to reset