DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

12/08/2022
by   Jianhao Yan, et al.
0

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation. However, MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks. In this work, we show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions. We demonstrate that even though label smoothing only causes a slight change in the token-level, the sequence-level distribution is highly skewed. We coin the issue distributional over-smoothness. To address this issue, we propose a simple and effective method, Distributional Cooling MBR (DC-MBR), which manipulates the entropy of output distributions by tuning down the Softmax temperature. We theoretically prove the equivalence between pre-tuning label smoothing factor and distributional cooling. Experiments on NMT benchmarks validate that distributional cooling improves MBR's efficiency and effectiveness in various settings.

READ FULL TEXT
research
05/02/2022

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

Label smoothing is ubiquitously applied in Neural Machine Translation (N...
research
03/06/2022

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

Label smoothing and vocabulary sharing are two widely used techniques in...
research
09/20/2020

Softmax Tempering for Training Neural Machine Translation Models

Neural machine translation (NMT) models are typically trained using a so...
research
04/11/2017

Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

For extended periods of time, sequence generation models rely on beam se...
research
05/18/2021

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

Neural Machine Translation (NMT) currently exhibits biases such as produ...
research
03/22/2022

Learning Confidence for Transformer-based Neural Machine Translation

Confidence estimation aims to quantify the confidence of the model predi...
research
06/11/2021

DORO: Distributional and Outlier Robust Optimization

Many machine learning tasks involve subpopulation shift where the testin...

Please sign up or login with your details

Forgot password? Click here to reset