DeepAI AI Chat
Log In Sign Up

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

by   Raj Dabre, et al.
National Institute of Information and Communications Technology

In Neural Machine Translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder. As a result, the addition of each new layer improves the translation quality significantly. However, this also leads to a significant increase in the number of parameters. In this paper we propose to share parameters across all the layers thereby leading to a recurrently stacked NMT model. We empirically show that the translation quality of a model that recurrently stacks a single layer 6 times is comparable to the translation quality of a model that stacks 6 separate layers. We also show that using back-translated parallel corpora as additional data leads to further significant improvements in translation quality.


page 1

page 2

page 3

page 4


Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation

In deep neural network modeling, the most common practice is to stack a ...

On the Sparsity of Neural Machine Translation Models

Modern neural machine translation (NMT) models employ a large number of ...

Not all parameters are born equal: Attention is mostly what you need

Transformers are widely used in state-of-the-art machine translation, bu...

Unfolding and Shrinking Neural Machine Translation Ensembles

Ensembling is a well-known technique in neural machine translation (NMT)...

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The capacity of a neural network to absorb information is limited by its...

Meta Back-translation

Back-translation is an effective strategy to improve the performance of ...

In-training Matrix Factorization for Parameter-frugal Neural Machine Translation

In this paper, we propose the use of in-training matrix factorization to...