Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

05/19/2022
by   Antonia Creswell, et al.
28

Large language models (LLMs) have been shown to be capable of impressive few-shot generalisation to new tasks. However, they still tend to perform poorly on multi-step logical reasoning problems. Here we carry out a comprehensive evaluation of LLMs on 50 tasks that probe different aspects of logical reasoning. We show that language models tend to perform fairly well at single step inference or entailment tasks, but struggle to chain together multiple reasoning steps to solve more complex problems. In light of this, we propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as general processing modules, and alternates between selection and inference to generate a series of interpretable, casual reasoning steps leading to the final answer. We show that a 7B parameter LLM used within the SI framework in a 5-shot generalisation setting, with no fine-tuning, yields a performance improvement of over 100 of 10 logical reasoning tasks. The same model in the same setting even outperforms a significantly larger 280B parameter baseline on the same suite of tasks. Moreover, answers produced by the SI framework are accompanied by a causal natural-language-based reasoning trace, which has important implications for the safety and trustworthiness of the system.

READ FULL TEXT

page 1

page 5

research
08/30/2022

Faithful Reasoning Using Large Language Models

Although contemporary large language models (LMs) demonstrate impressive...
research
03/28/2023

Explicit Planning Helps Language Models in Logical Reasoning

Language models have been shown to perform remarkably well on a wide ran...
research
10/22/2022

MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure

In this paper, we propose a comprehensive benchmark to investigate model...
research
12/15/2022

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Large language models show improved downstream task performance when pro...
research
09/11/2023

Textbooks Are All You Need II: phi-1.5 technical report

We continue the investigation into the power of smaller Transformer-base...
research
10/04/2022

ThinkSum: Probabilistic reasoning over sets using large language models

Large language models (LLMs) have a substantial capacity for high-level ...
research
11/30/2021

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Large pre-trained language models perform remarkably well on tasks that ...

Please sign up or login with your details

Forgot password? Click here to reset