Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

07/17/2020
by   Jinglin Liu, et al.
0

Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT). Since AT and NAT can share model structure and AT is an easier task than NAT due to the explicit dependency on previous target-side tokens, a natural idea is to gradually shift the model training from the easier AT task to the harder NAT task. To smooth the shift from AT training to NAT training, in this paper, we introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT contains a hyperparameter k, and each k value defines a SAT task with different degrees of parallelism. Specially, SAT covers AT and NAT as its special cases: it reduces to AT when k = 1 and to NAT when k = N (N is the length of target sentence). We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time. We called our method as task-level curriculum learning for NAT (TCL-NAT). Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baselines and reduces the performance gap between NAT and AT models to 1-2 BLEU points, demonstrating the effectiveness of our proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Non-autoregressive translation (NAT) models remove the dependence on pre...
research
08/26/2018

Semi-Autoregressive Neural Machine Translation

Existing approaches to neural machine translation are typically autoregr...
research
12/31/2020

Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade

Fully non-autoregressive neural machine translation (NAT) is proposed to...
research
10/25/2019

Fast Structured Decoding for Sequence Models

Autoregressive sequence models achieve state-of-the-art performance in d...
research
12/23/2018

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Non-autoregressive translation (NAT) models, which remove the dependence...
research
12/03/2022

The RoyalFlush System for the WMT 2022 Efficiency Task

This paper describes the submission of the RoyalFlush neural machine tra...
research
08/19/2021

MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation

Conditional masked language models (CMLM) have shown impressive progress...

Please sign up or login with your details

Forgot password? Click here to reset