A Convolutional Encoder Model for Neural Machine Translation

by   Jonas Gehring, et al.

The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. In this paper we present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the entire source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 English-Romanian translation we achieve competitive accuracy to the state-of-the-art and we outperform several recently published results on the WMT'15 English-German task. Our models obtain almost the same accuracy as a very deep LSTM setup on WMT'14 English-French translation. Our convolutional encoder speeds up CPU decoding by more than two times at the same or higher accuracy as a strong bi-directional LSTM baseline.


page 12

page 13


Learning to Refine Source Representations for Neural Machine Translation

Neural machine translation (NMT) models generally adopt an encoder-decod...

On Using Very Large Target Vocabulary for Neural Machine Translation

Neural machine translation, a recently proposed approach to machine tran...

Vocabulary Selection Strategies for Neural Machine Translation

Classical translation models constrain the space of possible outputs by ...

Modeling Latent Sentence Structure in Neural Machine Translation

Recently it was shown that linguistic structure predicted by a supervise...

Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

Attentional sequence-to-sequence models have become the new standard for...

Neural Machine Translation in Linear Time

We present a novel neural network for processing sequences. The ByteNet ...

Sentence-State LSTM for Text Representation

Bi-directional LSTMs are a powerful tool for text representation. On the...

Code Repositories


Facebook AI Research Sequence-to-Sequence Toolkit

view repo