Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

05/02/2022
by   Felix Stahlberg, et al.
3

The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens. Machine translation, however, is intrinsically uncertain: the same source sentence can have multiple semantically equivalent translations. Therefore, we propose to replace the softmax activation with a multi-label classification layer that can model ambiguity more effectively. We call our loss function Single-label Contrastive Objective for Non-Exclusive Sequences (SCONES). We show that the multi-label output layer can still be trained on single reference training data using the SCONES loss function. SCONES yields consistent BLEU score gains across six translation directions, particularly for medium-resource language pairs and small beam sizes. By using smaller beam sizes we can speed up inference by a factor of 3.9x and still match or improve the BLEU score obtained using softmax. Furthermore, we demonstrate that SCONES can be used to train NMT models that assign the highest probability to adequate translations, thus mitigating the "beam search curse". Additional experiments on synthetic language pairs with varying levels of uncertainty suggest that the improvements from SCONES can be attributed to better handling of ambiguity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Softmax Tempering for Training Neural Machine Translation Models

Neural machine translation (NMT) models are typically trained using a so...
research
01/07/2017

Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

Neural Machine Translation (NMT) is a new approach for Machine Translati...
research
04/01/2019

Learning to Stop in Structured Prediction for Neural Machine Translation

Beam search optimization resolves many issues in neural machine translat...
research
05/02/2022

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

Label smoothing is ubiquitously applied in Neural Machine Translation (N...
research
04/01/2022

Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models

In many natural language processing (NLP) tasks the same input (e.g. sou...
research
07/17/2023

Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training

Supervised learning in Neural Machine Translation (NMT) typically follow...
research
02/05/2016

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

We propose sparsemax, a new activation function similar to the tradition...

Please sign up or login with your details

Forgot password? Click here to reset