Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) problem. Empirical results reveal that the proposed model significantly outperforms sequence-to-sequence attention-based and tree-based neural translation models in English-Chinese translation tasks.


page 1

page 2

page 3

page 4


Agreement-based Joint Training for Bidirectional Attention-based Neural Machine Translation

The attentional mechanism has proven to be effective in improving end-to...

Tree-to-Sequence Attentional Neural Machine Translation

Most of the existing Neural Machine Translation (NMT) models focus on th...

Neural Machine Translation with Key-Value Memory-Augmented Attention

Although attention-based Neural Machine Translation (NMT) has achieved r...

Neural Machine Translation with Latent Semantic of Image and Text

Although attention-based Neural Machine Translation have achieved great ...

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

Most of the Neural Machine Translation (NMT) models are based on the seq...

An Empirical Study of Adequate Vision Span for Attention-Based Neural Machine Translation

Recently, the attention mechanism plays a key role to achieve high perfo...

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Celebrated Sequence to Sequence learning (Seq2Seq) and its fruitful vari...

1 Introduction

Neural machine translation (NMT) automatically learns the abstract features of and semantic relationship between the source and target sentences, and has recently given state-of-the-art results for various translation tasks (kalchbrenner2013recurrent; sutskever2014sequence; bahdanau2015neural). The most widely used model is the encoder-decoder framework (sutskever2014sequence), in which the source sentence is encoded into a dense representation, followed by a decoding process which generates the target translation. By exploiting the attention mechanism (bahdanau2015neural), the generation of target words is conditional on the source hidden states, rather than on the context vector alone. From a model architecture perspective, prior studies of the attentive encoder-decoder translation model are mainly divided into two types.

for tree= if n children=0 tier=terminal, , [S [PRP [I]] [VP [VP,name=spec CP , [VBP [take]][PRT [up]] ] [NP [NP,name=spec NP, [DT [a]] [NN [position]] ][-Latex[scale=1.0],blue] (spec NP) to[out=120,in=0] (spec CP); [PP, name=spec PP, [IN [in]] [NP, name=pp6, [the][room]] ][-Latex[scale=1.0],red] (spec PP) to[out=165,in=15] (spec NP); ] ] ]