Log In Sign Up

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

by   Qiu Ran, et al.

Non-autoregressive neural machine translation (NAT) predicts the entire target sequence simultaneously and significantly accelerates inference process. However, NAT discards the dependency information in a sentence, and thus inevitably suffers from the multi-modality problem: the target tokens may be provided by different possible translations, often causing token repetitions or missing. To alleviate this problem, we propose a novel semi-autoregressive model RecoverSAT in this work, which generates a translation as a sequence of segments. The segments are generated simultaneously while each segment is predicted token-by-token. By dynamically determining segment length and deleting repetitive segments, RecoverSAT is capable of recovering from repetitive and missing token errors. Experimental results on three widely-used benchmark datasets show that our proposed model achieves more than 4× speedup while maintaining comparable performance compared with the corresponding autoregressive model.


page 1

page 2

page 3

page 4


Modeling Coverage for Non-Autoregressive Neural Machine Translation

Non-Autoregressive Neural Machine Translation (NAT) has achieved signifi...

AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Non-autoregressive neural machine translation (NART) models suffer from ...

LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention

Non-autoregressive translation (NAT) models generate multiple tokens in ...

Context-Aware Cross-Attention for Non-Autoregressive Translation

Non-autoregressive translation (NAT) significantly accelerates the infer...

Multi-Granularity Optimization for Non-Autoregressive Translation

Despite low latency, non-autoregressive machine translation (NAT) suffer...

Non-Autoregressive Neural Machine Translation: A Call for Clarity

Non-autoregressive approaches aim to improve the inference speed of tran...

Non-Autoregressive Machine Translation with Auxiliary Regularization

As a new neural machine translation approach, Non-Autoregressive machine...

1 Introduction

Although neural machine translation (NMT) has achieved state-of-the-art performance in recent years (cho2014learning; bahdanau2015neural; vaswani2017attention), most NMT models still suffer from the slow decoding speed problem due to their autoregressive property: the generation of a target token depends on all the previously generated target tokens, making the decoding process intrinsically nonparallelizable.

Recently, non-autoregressive neural machine translation (NAT) models (gu2018non; li2019hint; wang2019non; guo2019non; wei2019imitation) have been investigated to mitigate the slow decoding speed problem by generating all target tokens independently in parallel, speeding up the decoding process significantly. Unfortunately, these models suffer from the multi-modality problem (gu2018non), resulting in inferior translation quality compared with autoregressive NMT. To be specific, a source sentence may have multiple feasible translations, and each target token may be generated with respect to different feasible translations since NAT models discard the dependency among target tokens. This generally manifests as repetitive or missing tokens in the translations. Table LABEL:tab:multi-modality shows an example. The German phrase “viele Farmer” can be translated as either “ lots of farmers” or “ a lot of farmers”. In the first translation (Trans. 1), “ lots of” are translated w.r.t. “ lots of farmers” while “ of farmers” are translated w.r.t. “ a lot of farmers” such that two “of” are generated. Similarly, “of” is missing in the second translation (Trans. 2). Intuitively, the multi-modality problem has a significant negative effect on the translation quality of NAT.