Copy Is All You Need

07/13/2023
by   Tian Lan, et al.
0

The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training.[Our source codes are publicly available at <https://github.com/gmftbyGMFTBY/Copyisallyouneed>.]

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2016

Neural Text Generation from Structured Data with Application to the Biography Domain

This paper introduces a neural model for concept-to-text generation that...
research
12/20/2021

May the Force Be with Your Copy Mechanism: Enhanced Supervised-Copy Method for Natural Language Generation

Recent neural sequence-to-sequence models with a copy mechanism have ach...
research
10/26/2022

Residual Learning of Neural Text Generation with n-gram Language Model

N-gram language models (LM) have been largely superseded by neural LMs a...
research
12/05/2022

Momentum Decoding: Open-ended Text Generation As Graph Exploration

Open-ended text generation with autoregressive language models (LMs) is ...
research
06/18/2022

Collocation2Text: Controllable Text Generation from Guide Phrases in Russian

Large pre-trained language models are capable of generating varied and f...
research
10/19/2019

Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation

Neural conditional text generation systems have achieved significant pro...
research
04/15/2022

Text Revision by On-the-Fly Representation Optimization

Text revision refers to a family of natural language generation tasks, w...

Please sign up or login with your details

Forgot password? Click here to reset