Fine-tuning language models to find agreement among humans with diverse preferences

11/28/2022
by   Michiel A. Bakker, et al.
0

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70 outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65 consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.

READ FULL TEXT

page 7

page 19

page 23

page 24

page 25

page 27

research
05/19/2023

Self-Agreement: A Framework for Fine-tuning Language Models to Find Agreement among Diverse Opinions

Finding an agreement among diverse opinions is a challenging topic in mu...
research
05/24/2023

Aligning Language Models to User Opinions

An important aspect of developing LLMs that interact with humans is to a...
research
12/17/2021

WebGPT: Browser-assisted question-answering with human feedback

We fine-tune GPT-3 to answer long-form questions using a text-based web-...
research
02/07/2023

ConsRec: Learning Consensus Behind Interactions for Group Recommendation

Since group activities have become very common in daily life, there is a...
research
02/07/2023

Capturing Topic Framing via Masked Language Modeling

Differential framing of issues can lead to divergent world views on impo...
research
02/06/2023

Languages are Rewards: Chain of Hindsight Finetuning using Human Feedback

Learning from human preferences is important for language models to be h...
research
08/20/2020

Positionality-Weighted Aggregation Methods on Cumulative Voting

The issue in solving social problems is how to respect minority opinions...

Please sign up or login with your details

Forgot password? Click here to reset