Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

06/02/2023
by   Zeqiu Wu, et al.
5

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with such reward functions leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at https://FineGrainedRLHF.github.io.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Evaluating the factuality of long-form text generated by large language ...
research
05/31/2023

Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards

Recent advances in text-to-image diffusion models have achieved remarkab...
research
05/17/2023

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Learning from human feedback has been shown to be effective at aligning ...
research
08/05/2022

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Frozen models trained to mimic static datasets can never improve their p...
research
05/23/2023

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

Large language models (e.g., GPT-3.5) are uniquely capable of producing ...
research
08/11/2023

Detecting and Preventing Hallucinations in Large Vision Language Models

Instruction tuned Large Vision Language Models (LVLMs) have made signifi...
research
08/14/2023

Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation

Fine-grained, span-level human evaluation has emerged as a reliable and ...

Please sign up or login with your details

Forgot password? Click here to reset