Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

06/18/2021
by   Irene Solaiman, et al.
0

Language models can generate harmful and biased outputs and exhibit undesirable behavior. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, and toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values

Although Shapley values have been shown to be highly effective for ident...
research
03/04/2022

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at fo...
research
07/27/2023

Backdoor Attacks for In-Context Learning with Language Models

Because state-of-the-art language models are expensive to train, most pr...
research
04/19/2021

Refining Targeted Syntactic Evaluation of Language Models

Targeted syntactic evaluation of subject-verb number agreement in Englis...
research
08/31/2019

Behavior Gated Language Models

Most current language modeling techniques only exploit co-occurrence, se...
research
06/08/2023

Mapping Brains with Language Models: A Survey

Over the years, many researchers have seemingly made the same observatio...
research
04/23/2021

Transfer training from smaller language model

Large language models have led to state-of-the-art accuracies across a r...

Please sign up or login with your details

Forgot password? Click here to reset