Log In Sign Up

Multi-Granularity Self-Attention for Neural Machine Translation

by   Jie Hao, et al.

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that Mg-Sa indeed captures useful phrase information at various levels of granularities.


page 1

page 2

page 3

page 4


Towards Neural Phrase-based Machine Translation

In this paper, we present Neural Phrase-based Machine Translation (NPMT)...

Continuous Decomposition of Granularity for Neural Paraphrase Generation

While Transformers have had significant success in paragraph generation,...

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

We inspect the multi-head self-attention in Transformer NMT encoders for...

BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings

In this paper, we propose a bidimensional attention based recursive auto...

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Non-autoregressive translation (NAT) significantly accelerates the infer...

Linguistically-Informed Self-Attention for Semantic Role Labeling

The current state-of-the-art end-to-end semantic role labeling (SRL) mod...

Phrase-level Adversarial Example Generation for Neural Machine Translation

While end-to-end neural machine translation (NMT) has achieved impressiv...