Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves Non-Autoregressive Translators

11/11/2022
by   Xinyou Wang, et al.
0

Recently, non-autoregressive (NAR) neural machine translation models have received increasing attention due to their efficient parallel decoding. However, the probabilistic framework of NAR models necessitates conditional independence assumption on target sequences, falling short of characterizing human language data. This drawback results in less informative learning signals for NAR models under conventional MLE training, thereby yielding unsatisfactory accuracy compared to their autoregressive (AR) counterparts. In this paper, we propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals. During training stage, we introduce a set of sufficiently weak AR decoders that solely rely on the information provided by NAR decoder to make prediction, forcing the NAR decoder to become stronger or else it will be unable to support its weak AR partners. Experiments on WMT and IWSLT datasets show that our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2021

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

This article describes an efficient end-to-end speech translation (E2E-S...
research
05/02/2020

Improving Non-autoregressive Neural Machine Translation with Monolingual Data

Non-autoregressive (NAR) neural machine translation is usually done via ...
research
03/30/2023

TreePiece: Faster Semantic Parsing via Tree Tokenization

Autoregressive (AR) encoder-decoder neural networks have proved successf...
research
06/08/2022

Autoregressive Perturbations for Data Poisoning

The prevalence of data scraping from social media as a means to obtain d...
research
06/29/2020

An EM Approach to Non-autoregressive Conditional Sequence Generation

Autoregressive (AR) models have been the dominating approach to conditio...
research
10/16/2020

Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation

The standard neural machine translation model can only decode with the s...
research
02/11/2020

Non-Autoregressive Neural Dialogue Generation

Maximum Mutual information (MMI), which models the bidirectional depende...

Please sign up or login with your details

Forgot password? Click here to reset