On the Advance of Making Language Models Better Reasoners

06/06/2022
by   Yifei Li, et al.
0

Large language models such as GPT-3 and PaLM have shown remarkable performance in few-shot learning. However, they still struggle with reasoning tasks such as the arithmetic benchmark GSM8K. Recent advances deliberately guide the language model to generate a chain of reasoning steps before producing the final answer, successfully boosting the GSM8K benchmark from 17.9 new approach, DiVeRSe (Diverse Verifier on Reasoning Step), to further advance their reasoning capability. DiVeRSe first explores different prompts to enhance the diversity in reasoning paths. Second, DiVeRSe introduces a verifier to distinguish good answers from bad answers for a better weighted voting. Finally, DiVeRSe verifies the correctness of each single step rather than all the steps in a whole. We conduct extensive experiments using the latest language model code-davinci-002 and demonstrate that DiVeRSe can achieve new state-of-the-art performance on six out of eight reasoning benchmarks (e.g., GSM8K 74.4

READ FULL TEXT
research
11/18/2022

PAL: Program-aided Language Models

Large language models (LLMs) have recently demonstrated an impressive ab...
research
05/23/2023

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Large language models (LLMs) have demonstrated remarkable capabilities i...
research
08/15/2023

Forward-Backward Reasoning in Large Language Models for Verification

Chain-of-Though (CoT) prompting has shown promising performance in vario...
research
04/28/2023

Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning

Languages models have been successfully applied to a variety of reasonin...
research
03/28/2022

STaR: Bootstrapping Reasoning With Reasoning

Generating step-by-step "chain-of-thought" rationales improves language ...
research
05/24/2023

Discriminator-Guided Multi-step Reasoning with Language Models

In the context of multi-step reasoning, language models (LMs) probabilit...
research
12/15/2022

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Large language models show improved downstream task performance when pro...

Please sign up or login with your details

Forgot password? Click here to reset