Softmax Tempering for Training Neural Machine Translation Models

09/20/2020
by   Raj Dabre, et al.
0

Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels. In low-resource scenarios, NMT models tend to over-fit because the softmax distribution quickly approaches the gold label distribution. To address this issue, we propose to divide the logits by a temperature coefficient, prior to applying softmax, during training. In our experiments on 11 language pairs in the Asian Language Treebank dataset and the WMT 2019 English-to-German translation task, we observed significant improvements in translation quality by up to 3.9 BLEU points. Furthermore, softmax tempering makes the greedy search to be as good as beam search decoding in terms of translation quality, enabling 1.5 to 3.5 times speed-up. We also study the impact of softmax tempering on multilingual NMT and recurrently stacked NMT, both of which aim to reduce the NMT model size by parameter sharing thereby verifying the utility of temperature in developing compact NMT models. Finally, an analysis of softmax entropies and gradients reveal the impact of our method on the internal behavior of NMT models.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/31/2020

Low Resource Neural Machine Translation: A Benchmark for Five African Languages

Recent advents in Neural Machine Translation (NMT) have shown improvemen...
05/02/2022

Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

The softmax layer in neural machine translation is designed to model the...
01/07/2017

Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

Neural Machine Translation (NMT) is a new approach for Machine Translati...
07/26/2021

Revisiting Negation in Neural Machine Translation

In this paper, we evaluate the translation of negation both automaticall...
04/24/2018

Unsupervised Neural Machine Translation with Weight Sharing

Unsupervised neural machine translation (NMT) is a recently proposed app...
03/22/2022

Learning Confidence for Transformer-based Neural Machine Translation

Confidence estimation aims to quantify the confidence of the model predi...
05/20/2020

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Recent studies have revealed a number of pathologies of neural machine t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.