Unfolding and Shrinking Neural Machine Translation Ensembles

04/11/2017
by   Felix Stahlberg, et al.
0

Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems because it is cumbersome and slow. This work aims to reduce the runtime to be on par with a single system without compromising the translation quality. First, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. We show that unfolding can already improve the runtime in practice since more work can be done on the GPU. We proceed by describing a set of techniques to shrink the unfolded network by reducing the dimensionality of layers. On Japanese-English we report that the resulting network has the size and decoding speed of a single NMT network but performs on the level of a 3-ensemble system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2016

Domain Control for Neural Machine Translation

Machine translation systems are very sensitive to the domains they were ...
research
07/14/2018

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

In Neural Machine Translation (NMT), the most common practice is to stac...
research
01/11/2019

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

We present ParaBank, a large-scale English paraphrase dataset that surpa...
research
06/02/2019

Domain Adaptive Inference for Neural Machine Translation

We investigate adaptive ensemble weighting for Neural Machine Translatio...
research
08/25/2018

Exploring Recombination for Efficient Decoding of Neural Machine Translation

In Neural Machine Translation (NMT), the decoder can capture the feature...
research
03/09/2015

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learn...
research
06/15/2016

The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

This paper presents the University of Cambridge submission to WMT16. Mot...

Please sign up or login with your details

Forgot password? Click here to reset