ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

11/28/2021
by   Bill Tuck Weng Pung, et al.
0

The ability to reason with multiple hierarchical structures is an attractive and desirable property of sequential inductive biases for natural language processing. Do the state-of-the-art Transformers and LSTM architectures implicitly encode for these biases? To answer this, we propose ORCHARD, a diagnostic dataset for systematically evaluating hierarchical reasoning in state-of-the-art neural sequence models. While there have been prior evaluation frameworks such as ListOps or Logical Inference, our work presents a novel and more natural setting where our models learn to reason with multiple explicit hierarchical structures instead of only one, i.e., requiring the ability to do both long-term sequence memorizing, relational reasoning while reasoning with hierarchical structure. Consequently, backed by a set of rigorous experiments, we show that (1) Transformer and LSTM models surprisingly fail in systematic generalization, and (2) with increased references between hierarchies, Transformer performs no better than random.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Developmental Negation Processing in Transformer Language Models

Reasoning using negation is known to be difficult for transformer-based ...
research
02/19/2020

Tree-structured Attention with Hierarchical Accumulation

Incorporating hierarchical structures like constituency trees has been s...
research
06/26/2020

What they do when in doubt: a study of inductive biases in seq2seq learners

Sequence-to-sequence (seq2seq) learners are widely used, but we still ha...
research
02/10/2021

Systematic Generalization for Predictive Control in Multivariate Time Series

Prior work has focused on evaluating the ability of neural networks to r...
research
11/01/2012

Surprisingly Rational: Probability theory plus noise explains biases in judgment

The systematic biases seen in people's probability judgments are typical...
research
05/21/2023

A Symbolic Framework for Systematic Evaluation of Mathematical Reasoning with Transformers

Whether Transformers can learn to apply symbolic rules and generalise to...
research
08/16/2019

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

The recent success of natural language understanding (NLU) systems has b...

Please sign up or login with your details

Forgot password? Click here to reset