REFINER: Reasoning Feedback on Intermediate Representations

04/04/2023
by   Debjit Paul, et al.
3

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermediate reasoning steps while interacting with a critic model that provides automated feedback on the reasoning. Specifically, the critic provides structured feedback that the reasoning LM uses to iteratively improve its intermediate arguments. Empirical evaluations of REFINER on three diverse reasoning tasks show significant improvements over baseline LMs of comparable scale. Furthermore, when using GPT3.5 as the reasoner, the trained critic significantly improves reasoning without finetuning the reasoner. Finally, our critic model is trained without expensive human-in-the-loop data but can be substituted with humans at inference time.

READ FULL TEXT

page 4

page 16

page 20

page 21

page 22

page 23

page 24

research
04/07/2023

Why think step-by-step? Reasoning emerges from the locality of experience

Humans have a powerful and mysterious capacity to reason. By working thr...
research
05/24/2023

Reasoning with Language Model is Planning with World Model

Large language models (LLMs) have shown remarkable reasoning capabilitie...
research
09/08/2023

Towards Reliable and Fluent Large Language Models: Incorporating Feedback Learning Loops in QA Systems

Large language models (LLMs) have emerged as versatile tools in various ...
research
02/16/2023

Empirical Investigation of Neural Symbolic Reasoning Strategies

Neural reasoning accuracy improves when generating intermediate reasonin...
research
05/31/2023

Let's Verify Step by Step

In recent years, large language models have greatly improved in their ab...
research
08/31/2023

Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection

Stance detection aims to identify the attitude expressed in a document t...
research
11/25/2022

Solving math word problems with process- and outcome-based feedback

Recent work has shown that asking language models to generate reasoning ...

Please sign up or login with your details

Forgot password? Click here to reset