Let's Verify Step by Step

05/31/2023
by   Hunter Lightman, et al.
1

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78 Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.

READ FULL TEXT
research
11/25/2022

Solving math word problems with process- and outcome-based feedback

Recent work has shown that asking language models to generate reasoning ...
research
12/08/2022

Successive Prompting for Decomposing Complex Questions

Answering complex questions that require making latent decisions is a ch...
research
04/04/2023

REFINER: Reasoning Feedback on Intermediate Representations

Language models (LMs) have recently shown remarkable performance on reas...
research
06/30/2023

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

Large language models have recently shown human level performance on a v...
research
06/23/2023

Neural Algorithmic Reasoning Without Intermediate Supervision

Neural Algorithmic Reasoning is an emerging area of machine learning foc...
research
08/22/2020

Supervision Levels Scale (SLS)

We propose a three-dimensional discrete and incremental scale to encode ...
research
06/21/2023

Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals

We present a human-in-the-loop dashboard tailored to diagnosing potentia...

Please sign up or login with your details

Forgot password? Click here to reset