Measuring Systematic Generalization in Neural Proof Generation with Transformers

09/30/2020
by   Nicolas Gontier, et al.
6

We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded in the form of natural language. We investigate systematic generalization abilities on an inductive logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate logical proofs represented in natural language. We systematically test proof generation capabilities, along with inference capabilities leveraging the generated proofs. We observe length-generalization issues in proof generation and inference when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This result suggests that Transformers have efficient, yet not interpretable reasoning strategies internally. These results also highlight the systematic generalization issues in TLMs in the context of logical reasoning, and we believe this work will motivate deeper inspection of their underlying reasoning strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

Transformers have been shown to emulate logical deduction over natural l...
research
12/20/2022

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language

Remarkable progress has been made on automated reasoning with knowledge ...
research
07/11/2022

Exploring Length Generalization in Large Language Models

The ability to extrapolate from short problem instances to longer ones i...
research
08/16/2019

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

The recent success of natural language understanding (NLU) systems has b...
research
03/19/2022

FaiRR: Faithful and Robust Deductive Reasoning over Natural Language

Transformers have been shown to be able to perform deductive reasoning o...
research
12/04/2021

LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

Natural Language Inference (NLI) is considered a representative task to ...
research
12/16/2021

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Investigating the reasoning abilities of transformer models, and discove...

Please sign up or login with your details

Forgot password? Click here to reset