PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

09/10/2021
by   Jiankai Sun, et al.
0

In this work, we study the problem of how to leverage instructional videos to facilitate the understanding of human decision-making processes, focusing on training a model with the ability to plan a goal-directed procedure from real-world videos. Learning structured and plannable state and action spaces directly from unstructured videos is the key technical challenge of our task. There are two problems: first, the appearance gap between the training and validation datasets could be large for unstructured videos; second, these gaps lead to decision errors that compound over the steps. We address these limitations with Planning Transformer (PlaTe), which has the advantage of circumventing the compounding prediction errors that occur with single-step models during long model-based rollouts. Our method simultaneously learns the latent state and action information of assigned tasks and the representations of the decision-making process from human demonstrations. Experiments conducted on real-world instructional videos and an interactive environment show that our method can achieve a better performance in reaching the indicated goal than previous algorithms. We also validated the possibility of applying procedural tasks on a UR-5 platform.

READ FULL TEXT

page 1

page 5

page 7

page 8

research
10/05/2021

Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning

Learning new skills by observing humans' behaviors is an essential capab...
research
07/02/2019

Procedure Planning in Instructional Videos

We propose a new challenging task: procedure planning in instructional v...
research
03/26/2023

PDPP:Projected Diffusion for Procedure Planning in Instructional Videos

In this paper, we study the problem of procedure planning in instruction...
research
05/14/2022

GoalNet: Inferring Conjunctive Goal Predicates from Human Plan Demonstrations for Robot Instruction Following

Our goal is to enable a robot to learn how to sequence its actions to pe...
research
03/05/2022

Boosting human decision-making with AI-generated decision aids

Human decision-making is plagued by many systematic errors. Many of thes...
research
05/04/2022

P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

In this paper, we study the problem of procedure planning in instruction...
research
06/15/2019

Delving into 3D Action Anticipation from Streaming Videos

Action anticipation, which aims to recognize the action with a partial o...

Please sign up or login with your details

Forgot password? Click here to reset