TaskLAMA: Probing the Complex Task Understanding of Language Models

08/29/2023
by   Quan Yuan, et al.
0

Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15 approaches to further improve their performance, with a relative improvement of 7 predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks.

READ FULL TEXT
research
06/06/2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

Language planning aims to implement complex high-level goals by decompos...
research
05/24/2023

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Large language models (LLM) like ChatGPT have become indispensable to ar...
research
08/26/2021

Rethinking Why Intermediate-Task Fine-Tuning Works

Supplementary Training on Intermediate Labeled-data Tasks (STILTs) is a ...
research
05/23/2023

Large Language Models as Commonsense Knowledge for Large-Scale Task Planning

Natural language provides a natural interface for human communication, y...
research
10/03/2021

Probing Language Models for Understanding of Temporal Expressions

We present three Natural Language Inference (NLI) challenge sets that ca...
research
05/24/2023

Reasoning with Language Model is Planning with World Model

Large language models (LLMs) have shown remarkable reasoning capabilitie...
research
05/18/2023

Take a Break in the Middle: Investigating Subgoals towards Hierarchical Script Generation

Goal-oriented Script Generation is a new task of generating a list of st...

Please sign up or login with your details

Forgot password? Click here to reset