ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

10/24/2022
by   Junyi Li, et al.
0

We study the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR) method is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it usually suffers from inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides, we propose a novel pre-training objective, Layer Permutation Language Modeling, to pre-train ELMER by permuting the exit layer for each token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and further narrows the performance gap with AR PLMs (ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving over 10 times inference speedup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Non-Autoregressive generation is a sequence generation paradigm, which r...
research
04/24/2023

Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

Non-AutoRegressive (NAR) text generation models have drawn much attentio...
research
05/01/2020

POINTER: Constrained Text Generation via Insertion-based Generative Pre-training

Large-scale pre-trained language models, such as BERT and GPT-2, have ac...
research
06/04/2023

Long Text Generation Challenge

We propose a shared task of human-like long text generation, LTG Challen...
research
06/18/2022

Collocation2Text: Controllable Text Generation from Guide Phrases in Russian

Large pre-trained language models are capable of generating varied and f...
research
07/05/2023

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Autoregressive large language models (LLMs) have made remarkable progres...
research
12/20/2021

Spiral Language Modeling

In almost all text generation applications, word sequences are construct...

Please sign up or login with your details

Forgot password? Click here to reset