Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

05/20/2020
by   Bryan Eikema, et al.
0

Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest that there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode, under the model distribution. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we criticise NMT models probabilistically showing that stochastic samples following the model's own generative story do reproduce various statistics of the training data well, but that it is beam search that strays from such statistics. We show that some of the known pathologies of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account statistics gathered from the model distribution holistically. As a proof of concept we show that a straightforward implementation of minimum Bayes risk decoding gives good results outperforming beam search using as little as 30 samples, confirming that MLE-trained NMT models do capture important aspects of translation well in expectation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2021

Sampling-Based Minimum Bayes Risk Decoding for Neural Machine Translation

In neural machine translation (NMT), we search for the mode of the model...
research
05/02/2022

Quality-Aware Decoding for Neural Machine Translation

Despite the progress in machine translation quality estimation and evalu...
research
05/18/2021

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

Neural Machine Translation (NMT) currently exhibits biases such as produ...
research
09/19/2023

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

Recent research in decoding methods for Natural Language Generation (NLG...
research
08/25/2018

Exploring Recombination for Efficient Decoding of Neural Machine Translation

In Neural Machine Translation (NMT), the decoder can capture the feature...
research
04/18/2018

Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

The end-to-end nature of neural machine translation (NMT) removes many w...
research
03/01/2022

RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

Beam search is the most widely used decoding method for neural machine t...

Please sign up or login with your details

Forgot password? Click here to reset