On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators

02/12/2021
by   Sidi Lu, et al.
0

Auto-regressive language models with the left-to-right generation order have been a predominant paradigm for language generation. Recently, out-of-order text generation beyond the traditional left-to-right paradigm has attracted extensive attention, with a notable variation of insertion-based generation, where a model is used to gradually extend the context into a complete sentence purely with insertion operations. However, since insertion operations disturb the position information of each token, it is often believed that each step of the insertion-based likelihood estimation requires a bi-directional re-encoding of the whole generated sequence. This computational overhead prohibits the model from scaling up to generate long, diverse texts such as stories, news articles, and reports. To address this issue, we propose InsNet, an insertion-based sequence model that can be trained as efficiently as traditional transformer decoders while maintaining the same performance as that with a bi-directional context encoder. We evaluate InsNet on story generation and CleVR-CoGENT captioning, showing the advantages of InsNet in several dimensions, including computational costs, generation quality, the ability to perfectly incorporate lexical controls, and better compositional generalization.

READ FULL TEXT
research
12/12/2021

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Auto-regressive neural sequence models have been shown to be effective a...
research
05/16/2023

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Diffusion models have gained significant attention in the realm of image...
research
03/13/2023

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive ...
research
07/19/2023

Efficient Guided Generation for Large Language Models

In this article we describe an efficient approach to guiding language mo...
research
02/19/2021

Progressive Transformer-Based Generation of Radiology Reports

Inspired by Curriculum Learning, we propose a consecutive (i.e. image-to...
research
05/27/2022

Controllable Text Generation with Neurally-Decomposed Oracle

We propose a general and efficient framework to control auto-regressive ...
research
12/07/2021

Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning

Handwritten mathematical expression recognition aims to automatically ge...

Please sign up or login with your details

Forgot password? Click here to reset