Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

07/17/2017 ∙ by Baosong Yang, et al. ∙ Northeastern University University of Macau 0

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) problem. Empirical results reveal that the proposed model significantly outperforms sequence-to-sequence attention-based and tree-based neural translation models in English-Chinese translation tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Neural machine translation (NMT) automatically learns the abstract features of and semantic relationship between the source and target sentences, and has recently given state-of-the-art results for various translation tasks (kalchbrenner2013recurrent; sutskever2014sequence; bahdanau2015neural). The most widely used model is the encoder-decoder framework (sutskever2014sequence), in which the source sentence is encoded into a dense representation, followed by a decoding process which generates the target translation. By exploiting the attention mechanism (bahdanau2015neural), the generation of target words is conditional on the source hidden states, rather than on the context vector alone. From a model architecture perspective, prior studies of the attentive encoder-decoder translation model are mainly divided into two types.

for tree= if n children=0 tier=terminal, , [S [PRP [I]] [VP [VP,name=spec CP , [VBP [take]][PRT [up]] ] [NP [NP,name=spec NP, [DT [a]] [NN [position]] ][-Latex[scale=1.0],blue] (spec NP) to[out=120,in=0] (spec CP); [PP, name=spec PP, [IN [in]] [NP, name=pp6, [the][room]] ][-Latex[scale=1.0],red] (spec PP) to[out=165,in=15] (spec NP); ] ] ]