Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

09/13/2021
by   Javier Ferrando, et al.
0

This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that largely reduce the word alignment error rate compared to standard induced alignments from attention weights.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2018

An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation

Recent work has shown that the encoder-decoder attention mechanisms in n...
research
08/17/2019

Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation

Neural machine translation (NMT) typically adopts the encoder-decoder fr...
research
07/18/2019

Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models

In this paper, we try to understand neural machine translation (NMT) via...
research
05/23/2022

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

In Neural Machine Translation (NMT), each token prediction is conditione...
research
10/31/2018

You May Not Need Attention

In NMT, how far can we get without attention and without separate encodi...
research
06/20/2019

Conflict as an Inverse of Attention in Sequence Relationship

Attention is a very efficient way to model the relationship between two ...
research
11/23/2022

Rank-One Editing of Encoder-Decoder Models

Large sequence to sequence models for tasks such as Neural Machine Trans...

Please sign up or login with your details

Forgot password? Click here to reset