See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction

10/07/2022
by   Maria Attarian, et al.
0

Cognitive planning is the structural decomposition of complex tasks into a sequence of future behaviors. In the computational setting, performing cognitive planning entails grounding plans and concepts in one or more modalities in order to leverage them for low level control. Since real-world tasks are often described in natural language, we devise a cognitive planning algorithm via language-guided video prediction. Current video prediction models do not support conditioning on natural language instructions. Therefore, we propose a new video prediction architecture which leverages the power of pre-trained transformers.The network is endowed with the ability to ground concepts based on natural language input with generalization to unseen objects. We demonstrate the effectiveness of this approach on a new simulation dataset, where each task is defined by a high-level action described in natural language. Our experiments compare our method again stone video generation baseline without planning or action grounding and showcase significant improvements. Our ablation studies highlight an improved generalization to unseen objects that natural language embeddings offer to concept grounding ability, as well as the importance of planning towards visual "imagination" of a task.

READ FULL TEXT

page 1

page 2

page 4

page 5

research
04/04/2022

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Large language models can encode a wealth of semantic knowledge about th...
research
11/23/2020

Action Concept Grounding Network for Semantically-Consistent Video Generation

Recent works in self-supervised video prediction have mainly focused on ...
research
09/17/2021

Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?

Models designed for intelligent process automation are required to be ca...
research
08/03/2020

Action sequencing using visual permutations

Humans can easily reason about the sequence of high level actions needed...
research
05/01/2019

From Abstractions to "Natural Languages" for Planning Agents

Despite our unique ability to use natural languages, we know little abou...
research
04/21/2017

Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities

Humans can ground natural language commands to tasks at both abstract an...
research
04/17/2023

Pretrained Language Models as Visual Planners for Human Assistance

To make progress towards multi-modal AI assistants which can guide users...

Please sign up or login with your details

Forgot password? Click here to reset