STEPS: A Benchmark for Order Reasoning in Sequential Tasks

06/07/2023
by   Weizhi Wang, et al.
0

Various human activities can be abstracted into a sequence of actions in natural text, i.e. cooking, repairing, manufacturing, etc. Such action sequences heavily depend on the executing order, while disorder in action sequences leads to failure of further task execution by robots or AI agents. Therefore, to verify the order reasoning capability of current neural models in sequential tasks, we propose a challenging benchmark , named STEPS. STEPS involves two subtask settings, focusing on determining the rationality of given next step in recipes and selecting the reasonable step from the multi-choice question, respectively. We describe the data construction and task formulations, and benchmark most of significant Large Language Models (LLMs). The experimental results demonstrate 1) The commonsense reasoning of action orders in sequential tasks are challenging to resolve via zero-shot prompting or few-shot in-context learning for LLMs; 2) Prompting method still significantly lags behind tuning-based method on STEPS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

Large Language Models are Zero-Shot Reasoners

Pretrained large language models (LLMs) are widely used in many sub-fiel...
research
12/20/2022

True Detective: A Challenging Benchmark for Deep Abductive Reasoning in Foundation Models

Large language models (LLMs) have demonstrated strong performance in zer...
research
05/24/2022

Workflow Discovery from Dialogues in the Low Data Regime

Text-based dialogues are now widely used to solve real-world problems. I...
research
01/08/2023

Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Logical reasoning task involves diverse types of complex reasoning over ...
research
10/06/2022

Learning to Reason With Relational Abstractions

Large language models have recently shown promising progress in mathemat...
research
06/02/2023

Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans

Task-oriented dialogue is difficult in part because it involves understa...
research
05/27/2023

Non-Sequential Graph Script Induction via Multimedia Grounding

Online resources such as WikiHow compile a wide range of scripts for per...

Please sign up or login with your details

Forgot password? Click here to reset