The Implicit Length Bias of Label Smoothing on Beam Search Decoding

05/02/2022
by   Bowen Liang, et al.
1

Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training. While label smoothing offers a desired regularization effect during model training, in this paper we demonstrate that it nevertheless introduces length biases in the beam search decoding procedure. Our analysis shows that label smoothing implicitly applies a length penalty term to output sequence, causing a bias towards shorter translations. We also show that for a model fully optimized with label smoothing, translation length is implicitly upper bounded by a fixed constant independent of input. We verify our theory by applying a simple rectification function at inference time to restore the unbiased distributions from the label-smoothed model predictions. This rectification method led to consistent quality improvements on WMT English-German, English-French, English-Czech and English-Chinese tasks, up to +0.3 BLEU at beam size 4 and +2.8 BLEU at beam size 200.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2018

Correcting Length Bias in Neural Machine Translation

We study two problems in neural machine translation (NMT). First, in bea...
research
12/08/2022

DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding alg...
research
01/23/2017

Regularizing Neural Networks by Penalizing Confident Output Distributions

We systematically explore regularizing neural networks by penalizing low...
research
04/01/2019

Learning to Stop in Structured Prediction for Neural Machine Translation

Beam search optimization resolves many issues in neural machine translat...
research
05/02/2022

Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

The softmax layer in neural machine translation is designed to model the...
research
08/27/2019

On NMT Search Errors and Model Errors: Cat Got Your Tongue?

We report on search errors and model errors in neural machine translatio...
research
10/05/2020

A Streaming Approach For Efficient Batched Beam Search

We propose an efficient batching strategy for variable-length decoding o...

Please sign up or login with your details

Forgot password? Click here to reset