Training Deeper Neural Machine Translation Models with Transparent Attention

08/22/2018
by   Ankur Bapna, et al.
0

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and Bi-RNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2020

Very Deep Transformers for Neural Machine Translation

We explore the application of very deep Transformer models for Neural Ma...
research
07/03/2019

Depth Growing for Neural Machine Translation

While very deep neural networks have shown effectiveness for computer vi...
research
10/08/2020

Shallow-to-Deep Training for Neural Machine Translation

Deep encoders have been proven to be effective in improving neural machi...
research
04/29/2020

Multiscale Collaborative Deep Models for Neural Machine Translation

Recent evidence reveals that Neural Machine Translation (NMT) models wit...
research
12/19/2018

DTMT: A Novel Deep Transition Architecture for Neural Machine Translation

Past years have witnessed rapid developments in Neural Machine Translati...
research
11/06/2021

Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

With the recent developments in the field of Natural Language Processing...
research
07/24/2017

Deep Architectures for Neural Machine Translation

It has been shown that increasing model depth improves the quality of ne...

Please sign up or login with your details

Forgot password? Click here to reset