Softmax Tempering for Training Neural Machine Translation Models

09/20/2020
by   Raj Dabre, et al.
0

Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels. In low-resource scenarios, NMT models tend to over-fit because the softmax distribution quickly approaches the gold label distribution. To address this issue, we propose to divide the logits by a temperature coefficient, prior to applying softmax, during training. In our experiments on 11 language pairs in the Asian Language Treebank dataset and the WMT 2019 English-to-German translation task, we observed significant improvements in translation quality by up to 3.9 BLEU points. Furthermore, softmax tempering makes the greedy search to be as good as beam search decoding in terms of translation quality, enabling 1.5 to 3.5 times speed-up. We also study the impact of softmax tempering on multilingual NMT and recurrently stacked NMT, both of which aim to reduce the NMT model size by parameter sharing thereby verifying the utility of temperature in developing compact NMT models. Finally, an analysis of softmax entropies and gradients reveal the impact of our method on the internal behavior of NMT models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2020

Low Resource Neural Machine Translation: A Benchmark for Five African Languages

Recent advents in Neural Machine Translation (NMT) have shown improvemen...
research
05/02/2022

Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

The softmax layer in neural machine translation is designed to model the...
research
09/17/2019

Pointer-based Fusion of Bilingual Lexicons into Neural Machine Translation

Neural machine translation (NMT) systems require large amounts of high q...
research
12/08/2022

DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding alg...
research
04/24/2018

Unsupervised Neural Machine Translation with Weight Sharing

Unsupervised neural machine translation (NMT) is a recently proposed app...
research
06/17/2019

Generalizing Back-Translation in Neural Machine Translation

Back-translation - data augmentation by translating target monolingual d...
research
09/13/2021

Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation

Neural Machine Translation (NMT) is known to suffer from a beam-search p...

Please sign up or login with your details

Forgot password? Click here to reset