Momentum Calibration for Text Generation

12/08/2022
by   Xingxing Zhang, et al.
0

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the model suffers from the exposure bias problem (i.e., it only has access to its previously predicted tokens rather gold tokens during beam search). In this paper, we propose MoCa (Momentum Calibration) for text generation. MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search and MoCa learns to align its model scores of these samples with their actual qualities. Experiments on four text generation datasets (i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently improves strong pre-trained transformers using vanilla fine-tuning and we achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Multimodal Story Generation on Plural Images

Traditionally, text generation models take in a sequence of text as inpu...
research
09/10/2020

Modern Methods for Text Generation

Synthetic text generation is challenging and has limited success. Recent...
research
05/22/2020

Investigating Label Bias in Beam Search for Open-ended Text Generation

Beam search is an effective and widely used decoding algorithm in many s...
research
05/01/2020

POINTER: Constrained Text Generation via Insertion-based Generative Pre-training

Large-scale pre-trained language models, such as BERT and GPT-2, have ac...
research
10/12/2020

Improving Text Generation with Student-Forcing Optimal Transport

Neural language models are often trained with maximum likelihood estimat...
research
10/05/2020

PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation

Pre-trained Transformers have enabled impressive breakthroughs in genera...
research
12/14/2020

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Recently, sequence-to-sequence (seq2seq) models with the Transformer arc...

Please sign up or login with your details

Forgot password? Click here to reset