An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation

10/17/2018
by   Gongbo Tang, et al.
0

Recent work has shown that the encoder-decoder attention mechanisms in neural machine translation (NMT) are different from the word alignment in statistical machine translation. In this paper, we focus on analyzing encoder-decoder attention mechanisms, in the case of word sense disambiguation (WSD) in NMT models. We hypothesize that attention mechanisms pay more attention to context tokens when translating ambiguous words. We explore the attention distribution patterns when translating ambiguous nouns. Counter-intuitively, we find that attention mechanisms are likely to distribute more attention to the ambiguous noun itself rather than context tokens, in comparison to other nouns. We conclude that attention mechanism is not the main mechanism used by NMT models to incorporate contextual information for WSD. The experimental results suggest that NMT models learn to encode contextual information necessary for WSD in the encoder hidden states. For the attention mechanism in Transformer models, we reveal that the first few layers gradually learn to "align" source and target tokens and the last few layers learn to extract features from the related but unaligned context tokens.

READ FULL TEXT
research
09/13/2021

Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

This work proposes an extensive analysis of the Transformer architecture...
research
08/30/2019

Encoders Help You Disambiguate Word Senses in Neural Machine Translation

Neural machine translation (NMT) has achieved new state-of-the-art perfo...
research
05/30/2022

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation

Unlike literal expressions, idioms' meanings do not directly follow from...
research
09/04/2018

A Novel Neural Sequence Model with Multiple Attentions for Word Sense Disambiguation

Word sense disambiguation (WSD) is a well researched problem in computat...
research
11/01/2018

Hybrid Self-Attention Network for Machine Translation

The encoder-decoder is the typical framework for Neural Machine Translat...
research
09/17/2020

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Recent work on the lottery ticket hypothesis has produced highly sparse ...
research
10/05/2018

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

This paper demonstrates that word sense disambiguation (WSD) can improve...

Please sign up or login with your details

Forgot password? Click here to reset