Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

12/16/2021
by   Ilia Kulikov, et al.
0

Neural autoregressive sequence models smear the probability among many possible sequences including degenerate ones, such as empty or repetitive sequences. In this work, we tackle one specific case where the model assigns a high probability to unreasonably short sequences. We define the oversmoothing rate to quantify this issue. After confirming the high degree of oversmoothing in neural machine translation, we propose to explicitly minimize the oversmoothing rate during training. We conduct a set of experiments to study the effect of the proposed regularization on both model distribution and decoding performance. We use a neural machine translation task as the testbed and consider three different datasets of varying size. Our experiments reveal three major findings. First, we can control the oversmoothing rate of the model by tuning the strength of the regularization. Second, by enhancing the oversmoothing loss contribution, the probability and the rank of <eos> token decrease heavily at positions where it is not supposed to be. Third, the proposed regularization impacts the outcome of beam search especially when a large beam is used. The degradation of translation quality (measured in BLEU) with a large beam significantly lessens with lower oversmoothing rate, but the degradation compared to smaller beam sizes remains to exist. From these observations, we conclude that the high degree of oversmoothing is the main reason behind the degenerate case of overly probable short sequences in a neural autoregressive model.

READ FULL TEXT
research
11/06/2019

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

Non-autoregressive neural machine translation (NAT) generates each targe...
research
06/10/2021

Mode recovery in neural autoregressive sequence modeling

Despite its wide use, recent studies have revealed unexpected and undesi...
research
09/13/2021

Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation

Neural Machine Translation (NMT) is known to suffer from a beam-search p...
research
08/20/2019

Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior

Although neural machine translation models reached high translation qual...
research
07/08/2021

Using CollGram to Compare Formulaic Language in Human and Neural Machine Translation

A comparison of formulaic sequences in human and neural machine translat...
research
08/28/2018

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Beam search is widely used in neural machine translation, and usually im...
research
03/08/2020

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

We demonstrate how we can practically incorporate multi-step future info...

Please sign up or login with your details

Forgot password? Click here to reset