Surfacing Biases in Large Language Models using Contrastive Input Decoding

05/12/2023
by   Gal Yona, et al.
0

Ensuring that large language models (LMs) are fair, robust and useful requires an understanding of how different modifications to their inputs impact the model's behaviour. In the context of open-text generation tasks, however, such an evaluation is not trivial. For example, when introducing a model with an input text and a perturbed, "contrastive" version of it, meaningful differences in the next-token predictions may not be revealed with standard decoding strategies. With this motivation in mind, we propose Contrastive Input Decoding (CID): a decoding algorithm to generate text given two inputs, where the generated text is likely given one input but unlikely given the other. In this way, the contrastive generations can highlight potentially subtle differences in how the LM output differs for the two inputs in a simple and interpretable manner. We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies and quantify the effect of different input perturbations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2023

Contrastive Decoding Improves Reasoning in Large Language Models

We demonstrate that Contrastive Decoding – a simple, computationally lig...
research
10/27/2022

Contrastive Decoding: Open-ended Text Generation as Optimization

Likelihood, although useful as a training loss, is a poor search objecti...
research
10/06/2022

Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models

We explore the idea of compressing the prompts used to condition languag...
research
09/09/2023

Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System

Neural language models are increasingly deployed into APIs and websites ...
research
09/15/2021

On the Limits of Minimal Pairs in Contrastive Evaluation

Minimal sentence pairs are frequently used to analyze the behavior of la...
research
09/24/2021

Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding

Written language carries explicit and implicit biases that can distract ...
research
05/30/2023

Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses

A human decision-maker benefits the most from an AI assistant that corre...

Please sign up or login with your details

Forgot password? Click here to reset