Parallel Attention Mechanisms in Neural Machine Translation

10/29/2018
by   Julian Richard Medina, et al.
0

Recent papers in neural machine translation have proposed the strict use of attention mechanisms over previous standards such as recurrent and convolutional neural networks (RNNs and CNNs). We propose that by running traditionally stacked encoding branches from encoder-decoder attention- focused architectures in parallel, that even more sequential operations can be removed from the model and thereby decrease training time. In particular, we modify the recently published attention-based architecture called Transformer by Google, by replacing sequential attention modules with parallel ones, reducing the amount of training time and substantially improving BLEU scores at the same time. Experiments over the English to German and English to French translation tasks show that our model establishes a new state of the art.

READ FULL TEXT

page 3

page 5

research
06/12/2017

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent...
research
09/02/2019

Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation

Reduction of training time is an important issue in many tasks like pate...
research
02/16/2020

Neural Machine Translation with Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems ...
research
07/24/2017

Deep Architectures for Neural Machine Translation

It has been shown that increasing model depth improves the quality of ne...
research
01/05/2016

Multi-Source Neural Translation

We build a multi-source machine translation model and train it to maximi...
research
10/17/2019

Fully Quantized Transformer for Improved Translation

State-of-the-art neural machine translation methods employ massive amoun...

Please sign up or login with your details

Forgot password? Click here to reset