Efficient Inference For Neural Machine Translation

10/06/2020
by   Yi-Te Hsu, et al.
0

Large Transformer models have achieved state-of-the-art results in neural machine translation and have become standard in the field. In this work, we look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality. We conduct an empirical study that stacks various approaches and demonstrates that combination of replacing decoder self-attention with simplified recurrent units, adopting a deep encoder and a shallow decoder architecture and multi-head attention pruning can achieve up to 109 parameters by 25 BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2018

Tensor2Tensor for Neural Machine Translation

Tensor2Tensor is a library for deep learning models that is well-suited ...
research
05/02/2020

Hard-Coded Gaussian Attention for Neural Machine Translation

Recent work has questioned the importance of the Transformer's multi-hea...
research
05/23/2019

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Multi-head self-attention is a key component of the Transformer, a state...
research
06/05/2022

Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

Recent work in multilingual translation advances translation quality sur...
research
07/24/2017

Deep Architectures for Neural Machine Translation

It has been shown that increasing model depth improves the quality of ne...
research
06/12/2018

Explaining and Generalizing Back-Translation through Wake-Sleep

Back-translation has become a commonly employed heuristic for semi-super...

Please sign up or login with your details

Forgot password? Click here to reset