DeepAI AI Chat
Log In Sign Up

Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers

by   Raj Dabre, et al.

This paper proposes a novel procedure for training an encoder-decoder based deep neural network which compresses NxM models into a single model enabling us to dynamically choose the number of encoder and decoder layers for decoding. Usually, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute softmax loss. Instead, our method computes a single loss consisting of NxM losses: the softmax loss for the output of each of the M decoder layers derived using the output of each of the N encoder layers. A single model trained by our method can be used for decoding with an arbitrary fewer number of encoder and decoder layers. In practical scenarios, this (a) enables faster decoding with insignificant losses in translation quality and (b) alleviates the need to train NxM models, thereby saving space. We take a case study of neural machine translation and show the advantage and give a cost-benefit analysis of our approach.


page 1

page 2

page 3

page 4


Balancing Cost and Benefit with Tied-Multi Transformers

We propose and evaluate a novel procedure for training multiple Transfor...

Exploiting Deep Representations for Neural Machine Translation

Advanced neural machine translation (NMT) models generally implement enc...

Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation

In deep neural network modeling, the most common practice is to stack a ...

BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

BERT-enhanced neural machine translation (NMT) aims at leveraging BERT-e...

Layer-Wise Multi-View Learning for Neural Machine Translation

Traditional neural machine translation is limited to the topmost encoder...

VisCode: Embedding Information in Visualization Images using Encoder-Decoder Network

We present an approach called VisCode for embedding information into vis...

Training Neural Machine Translation To Apply Terminology Constraints

This paper proposes a novel method to inject custom terminology into neu...