Hybrid-Regressive Neural Machine Translation

10/19/2022
by   Qiang Wang, et al.
0

In this work, we empirically confirm that non-autoregressive translation with an iterative refinement mechanism (IR-NAT) suffers from poor acceleration robustness because it is more sensitive to decoding batch size and computing device setting than autoregressive translation (AT). Inspired by it, we attempt to investigate how to combine the strengths of autoregressive and non-autoregressive translation paradigms better. To this end, we demonstrate through synthetic experiments that prompting a small number of AT's predictions can promote one-shot non-autoregressive translation to achieve the equivalent performance of IR-NAT. Following this line, we propose a new two-stage translation prototype called hybrid-regressive translation (HRT). Specifically, HRT first generates discontinuous sequences via autoregression (e.g., make a prediction every k tokens, k>1) and then fills in all previously skipped tokens at once in a non-autoregressive manner. We also propose a bag of techniques to effectively and efficiently train HRT without adding any model parameters. HRT achieves the state-of-the-art BLEU score of 28.49 on the WMT En-De task and is at least 1.5x faster than AT, regardless of batch size and device. In addition, another bonus of HRT is that it successfully inherits the good characteristics of AT in the deep-encoder-shallow-decoder architecture. Concretely, compared to the vanilla HRT with a 6-layer encoder and 6-layer decoder, the inference speed of HRT with a 12-layer encoder and 1-layer decoder is further doubled on both GPU and CPU without BLEU loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2022

The RoyalFlush System for the WMT 2022 Efficiency Task

This paper describes the submission of the RoyalFlush neural machine tra...
research
06/18/2020

Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation

State-of-the-art neural machine translation models generate outputs auto...
research
10/14/2021

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

How do we perform efficient inference while retaining high translation q...
research
03/30/2022

Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding

In this paper, we propose Generalized Aggressive Decoding (GAD) – a nove...
research
06/06/2019

Syntactically Supervised Transformers for Faster Neural Machine Translation

Standard decoders for neural machine translation autoregressively genera...
research
11/20/2019

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Non-autoregressive translation (NAT) models remove the dependence on pre...
research
05/04/2022

Non-Autoregressive Machine Translation: It's Not as Fast as it Seems

Efficient machine translation models are commercially important as they ...

Please sign up or login with your details

Forgot password? Click here to reset