A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

05/23/2022
by   Weizhen Qi, et al.
0

Non-Autoregressive generation is a sequence generation paradigm, which removes the dependency between target tokens. It could efficiently reduce the text generation latency with parallel decoding in place of token-by-token sequential decoding. However, due to the known multi-modality problem, Non-Autoregressive (NAR) models significantly under-perform Auto-regressive (AR) models on various language generation tasks. Among the NAR models, BANG is the first large-scale pre-training model on English un-labeled raw text corpus. It considers different generation paradigms as its pre-training tasks including Auto-regressive (AR), Non-Autoregressive (NAR), and semi-Non-Autoregressive (semi-NAR) information flow with multi-stream strategy. It achieves state-of-the-art performance without any distillation techniques. However, AR distillation has been shown to be a very effective solution for improving NAR performance. In this paper, we propose a novel self-paced mixed distillation method to further improve the generation quality of BANG. Firstly, we propose the mixed distillation strategy based on the AR stream knowledge. Secondly, we encourage the model to focus on the samples with the same modality by self-paced learning. The proposed self-paced mixed distillation algorithm improves the generation quality and has no influence on the inference latency. We carry out extensive experiments on summarization and question generation tasks to validate the effectiveness. To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications. Experimental results on commercial data show the effectiveness of the proposed model. Compared with BANG, it achieves significant BLEU score improvement. On the other hand, compared with auto-regressive generation method, it achieves more than 7x speedup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

We study the text generation task under the approach of pre-trained lang...
research
04/22/2020

A Study of Non-autoregressive Model for Sequence Generation

Non-autoregressive (NAR) models generate all the tokens of a sequence in...
research
10/24/2020

Unsupervised Paraphrase Generation via Dynamic Blocking

We propose Dynamic Blocking, a decoding algorithm which enables large-sc...
research
06/29/2020

An EM Approach to Non-autoregressive Conditional Sequence Generation

Autoregressive (AR) models have been the dominating approach to conditio...
research
02/11/2020

Non-Autoregressive Neural Dialogue Generation

Maximum Mutual information (MMI), which models the bidirectional depende...
research
12/22/2021

Diformer: Directional Transformer for Neural Machine Translation

Autoregressive (AR) and Non-autoregressive (NAR) models have their own s...
research
03/01/2023

StraIT: Non-autoregressive Generation with Stratified Image Transformer

We propose Stratified Image Transformer(StraIT), a pure non-autoregressi...

Please sign up or login with your details

Forgot password? Click here to reset