Mixed Cross Entropy Loss for Neural Machine Translation

06/30/2021
by   Haoran Li, et al.
0

In neural machine translation, cross entropy (CE) is the standard loss function in two training methods of auto-regressive models, i.e., teacher forcing and scheduled sampling. In this paper, we propose mixed cross entropy loss (mixed CE) as a substitute for CE in both training approaches. In teacher forcing, the model trained with CE regards the translation problem as a one-to-one mapping process, while in mixed CE this process can be relaxed to one-to-many. In scheduled sampling, we show that mixed CE has the potential to encourage the training and testing behaviours to be similar to each other, more effectively mitigating the exposure bias problem. We demonstrate the superiority of mixed CE over CE on several machine translation datasets, WMT'16 Ro-En, WMT'16 Ru-En, and WMT'14 En-De in both teacher forcing and scheduled sampling setups. Furthermore, in WMT'14 En-De, we also find mixed CE consistently outperforms CE on a multi-reference set as well as a challenging paraphrased reference set. We also found the model trained with mixed CE is able to provide a better probability distribution defined over the translation output space. Our code is available at https://github.com/haorannlp/mix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

We propose a new training objective named order-agnostic cross entropy (...
research
04/03/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Non-autoregressive machine translation models significantly speed up dec...
research
10/13/2021

Well-classified Examples are Underestimated in Classification with Deep Neural Networks

The conventional wisdom behind learning deep classification models is to...
research
03/18/2021

Smoothing and Shrinking the Sparse Seq2Seq Search Space

Current sequence-to-sequence models are trained to minimize cross-entrop...
research
09/21/2021

Learning Kernel-Smoothed Machine Translation with Retrieved Examples

How to effectively adapt neural machine translation (NMT) models accordi...
research
09/16/2021

Scaling Laws for Neural Machine Translation

We present an empirical study of scaling properties of encoder-decoder T...
research
04/09/2019

Data Selection with Cluster-Based Language Difference Models and Cynical Selection

We present and apply two methods for addressing the problem of selecting...

Please sign up or login with your details

Forgot password? Click here to reset