WebGPT: Browser-assisted question-answering with human feedback

12/17/2021
by   Reiichiro Nakano, et al.
0

We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56 of the time to those of our human demonstrators, and 69 highest-voted answer from Reddit.

READ FULL TEXT

page 6

page 19

page 25

research
05/11/2023

WebCPM: Interactive Web Search for Chinese Long-form Question Answering

Long-form question answering (LFQA) aims at answering complex, open-ende...
research
09/08/2021

TruthfulQA: Measuring How Models Mimic Human Falsehoods

We propose a benchmark to measure whether a language model is truthful i...
research
06/15/2019

Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation

An end-to-end data integration system requires human feedback in several...
research
11/28/2022

Fine-tuning language models to find agreement among humans with diverse preferences

Recent work in large language modeling (LLMs) has used fine-tuning to al...
research
05/02/2020

AVA: an Automatic eValuation Approach to Question Answering Systems

We introduce AVA, an automatic evaluation approach for Question Answerin...
research
09/22/2021

Recursively Summarizing Books with Human Feedback

A major challenge for scaling machine learning is training models to per...
research
03/09/2022

On the Evaluation of Answer-Agnostic Paragraph-level Multi-Question Generation

We study the task of predicting a set of salient questions from a given ...

Please sign up or login with your details

Forgot password? Click here to reset