Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers

08/27/2019
by   Raj Dabre, et al.
0

This paper proposes a novel procedure for training an encoder-decoder based deep neural network which compresses NxM models into a single model enabling us to dynamically choose the number of encoder and decoder layers for decoding. Usually, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute softmax loss. Instead, our method computes a single loss consisting of NxM losses: the softmax loss for the output of each of the M decoder layers derived using the output of each of the N encoder layers. A single model trained by our method can be used for decoding with an arbitrary fewer number of encoder and decoder layers. In practical scenarios, this (a) enables faster decoding with insignificant losses in translation quality and (b) alleviates the need to train NxM models, thereby saving space. We take a case study of neural machine translation and show the advantage and give a cost-benefit analysis of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Balancing Cost and Benefit with Tied-Multi Transformers

We propose and evaluate a novel procedure for training multiple Transfor...
research
10/24/2018

Exploiting Deep Representations for Neural Machine Translation

Advanced neural machine translation (NMT) models generally implement enc...
research
06/18/2021

Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation

In deep neural network modeling, the most common practice is to stack a ...
research
11/09/2020

BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

BERT-enhanced neural machine translation (NMT) aims at leveraging BERT-e...
research
11/03/2020

Layer-Wise Multi-View Learning for Neural Machine Translation

Traditional neural machine translation is limited to the topmost encoder...
research
09/07/2020

VisCode: Embedding Information in Visualization Images using Encoder-Decoder Network

We present an approach called VisCode for embedding information into vis...
research
06/03/2019

Training Neural Machine Translation To Apply Terminology Constraints

This paper proposes a novel method to inject custom terminology into neu...

Please sign up or login with your details

Forgot password? Click here to reset