Weighted Transformer Network for Machine Translation

11/06/2017
by   Karim Ahmed, et al.
0

State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine translation tasks, it requires a large number of parameters and training iterations to converge. We propose Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15-40 multi-head attention by multiple self-attention branches that the model learns to combine during the training process. Our model improves the state-of-the-art performance by 0.5 BLEU points on the WMT 2014 English-to-German translation task and by 0.4 on the English-to-French translation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2020

Transformer++

Recent advancements in attention mechanisms have replaced recurrent neur...
research
05/13/2020

A Mixture of h-1 Heads is Better than h Heads

Multi-head attentive neural architectures have achieved state-of-the-art...
research
06/01/2018

Scaling Neural Machine Translation

Sequence to sequence learning models still require several days to reach...
research
08/27/2019

Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

The architecture of Transformer is based entirely on self-attention, and...
research
03/06/2018

Self-Attention with Relative Position Representations

Relying entirely on an attention mechanism, the Transformer introduced b...
research
03/03/2020

Meta-Embeddings Based On Self-Attention

Creating meta-embeddings for better performance in language modelling ha...
research
10/17/2019

Fully Quantized Transformer for Improved Translation

State-of-the-art neural machine translation methods employ massive amoun...

Please sign up or login with your details

Forgot password? Click here to reset