A GRU-Gated Attention Model for Neural Machine Translation

04/27/2017
by   Biao Zhang, et al.
0

Neural machine translation (NMT) heavily relies on an attention network to produce a context vector for each target word prediction. In practice, we find that context vectors for different target words are quite similar to one another and therefore are insufficient in discriminatively predicting target words. The reason for this might be that context vectors produced by the vanilla attention network are just a weighted sum of source representations that are invariant to decoder states. In this paper, we propose a novel GRU-gated attention model (GAtt) for NMT which enhances the degree of discrimination of context vectors by enabling source representations to be sensitive to the partial translation generated by the decoder. GAtt uses a gated recurrent unit (GRU) to combine two types of information: treating a source annotation vector originally produced by the bidirectional encoder as the history state while the corresponding previous decoder state as the input to the GRU. The GRU-combined information forms a new source annotation vector. In this way, we can obtain translation-sensitive source representations which are then feed into the attention network to generate discriminative context vectors. We further propose a variant that regards a source annotation vector as the current input while the previous decoder state as the history. Experiments on NIST Chinese-English translation tasks show that both GAtt-based models achieve significant improvements over the vanilla attentionbased NMT. Further analyses on attention weights and context vectors demonstrate the effectiveness of GAtt in improving the discrimination power of representations and handling the challenging issue of over-translation.

READ FULL TEXT

page 5

page 6

research
09/02/2019

Improving Context-aware Neural Machine Translation with Target-side Context

In recent years, several studies on neural machine translation (NMT) hav...
research
05/31/2017

Learning When to Attend for Neural Machine Translation

In the past few years, attention mechanisms have become an indispensable...
research
11/12/2017

Syntax-Directed Attention for Neural Machine Translation

Attention mechanism, including global attention and local attention, pla...
research
08/05/2017

Neural Machine Translation with Word Predictions

In the encoder-decoder architecture for neural machine translation (NMT)...
research
07/03/2016

Context-Dependent Word Representation for Neural Machine Translation

We first observe a potential weakness of continuous vector representatio...
research
05/02/2017

Modeling Source Syntax for Neural Machine Translation

Even though a linguistics-free sequence to sequence model in neural mach...
research
09/04/2020

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Multimodal machine translation (MMT), which mainly focuses on enhancing ...

Please sign up or login with your details

Forgot password? Click here to reset