The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers

05/02/2023
by   Ariel Gera, et al.
0

Applying language models to natural language processing tasks typically relies on the representations in the final model layer, as intermediate hidden layer representations are presumed to be less informative. In this work, we argue that due to the gradual improvement across model layers, additional information can be gleaned from the contrast between higher and lower layers during inference. Specifically, in choosing between the probable next token predictions of a generative model, the predictions of lower layers can be used to highlight which candidates are best avoided. We propose a novel approach that utilizes the contrast between layers to improve text generation outputs, and show that it mitigates degenerative behaviors of the model in open-ended generation, significantly improving the quality of generated texts. Furthermore, our results indicate that contrasting between model layers at inference time can yield substantial benefits to certain aspects of general language model capabilities, more effectively extracting knowledge during inference from a given set of model parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2021

Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement Learning

Likelihood training and maximization-based decoding result in dull and r...
research
03/21/2022

Enhancing Speech Recognition Decoding via Layer Aggregation

Recently proposed speech recognition systems are designed to predict usi...
research
12/20/2021

Spiral Language Modeling

In almost all text generation applications, word sequences are construct...
research
06/12/2023

Deep Model Compression Also Helps Models Capture Ambiguity

Natural language understanding (NLU) tasks face a non-trivial amount of ...
research
09/07/2023

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Despite their impressive capabilities, large language models (LLMs) are ...
research
07/28/2023

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usi...
research
03/16/2023

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

Transformer-based language models (LMs) create hidden representations of...

Please sign up or login with your details

Forgot password? Click here to reset