DeepAI AI Chat
Log In Sign Up

Training Deeper Neural Machine Translation Models with Transparent Attention

08/22/2018
by   Ankur Bapna, et al.
Google
0

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and Bi-RNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/18/2020

Very Deep Transformers for Neural Machine Translation

We explore the application of very deep Transformer models for Neural Ma...
07/03/2019

Depth Growing for Neural Machine Translation

While very deep neural networks have shown effectiveness for computer vi...
10/08/2020

Shallow-to-Deep Training for Neural Machine Translation

Deep encoders have been proven to be effective in improving neural machi...
04/29/2020

Multiscale Collaborative Deep Models for Neural Machine Translation

Recent evidence reveals that Neural Machine Translation (NMT) models wit...
12/19/2018

DTMT: A Novel Deep Transition Architecture for Neural Machine Translation

Past years have witnessed rapid developments in Neural Machine Translati...
07/24/2017

Deep Architectures for Neural Machine Translation

It has been shown that increasing model depth improves the quality of ne...
10/22/2020

Not all parameters are born equal: Attention is mostly what you need

Transformers are widely used in state-of-the-art machine translation, bu...