A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration

05/05/2022
by   Shaojie Jiang, et al.
8

The cross-entropy objective has proved to be an all-purpose training objective for autoregressive language models (LMs). However, without considering the penalization of problematic tokens, LMs trained using cross-entropy exhibit text degeneration. To address this, unlikelihood training has been proposed to force unlikely tokens to be assigned a low probability by a LM. But unlikelihood does not consider the relationship between the label tokens and the unlikely token candidates, thus showing marginal improvements in degeneration. We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training and avoids their limitations. The key idea is to force a LM to generate high probabilities for label tokens at each step while low probabilities of negative candidates. Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts, with a higher generation quality than unlikelihood training, achieving the new state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

We propose a new training objective named order-agnostic cross entropy (...
research
11/20/2018

Another Diversity-Promoting Objective Function for Neural Dialogue Generation

Although generation-based dialogue systems have been widely researched, ...
research
07/04/2023

Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

Despite the huge progress in myriad generation tasks, pretrained languag...
research
08/27/2023

Towards Unified Token Learning for Vision-Language Tracking

In this paper, we present a simple, flexible and effective vision-langua...
research
02/17/2023

A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe Processes

It was observed that large language models exhibit a power-law decay of ...
research
02/01/2022

Regression Transformer: Concurrent Conditional Generation and Regression by Blending Numerical and Textual Tokens

We report the Regression Transformer (RT), a method that abstracts regre...
research
10/27/2022

Dictionary-Assisted Supervised Contrastive Learning

Text analysis in the social sciences often involves using specialized di...

Please sign up or login with your details

Forgot password? Click here to reset