Procedure Planning in Instructional Videos

07/02/2019
by   Chien-Yi Chang, et al.
0

We propose a new challenging task: procedure planning in instructional videos. Unlike existing planning problems, where both the state and the action spaces are well-defined, the key challenge of planning in instructional videos is that both the state and the action spaces are open-vocabulary. We address this challenge with latent space planning, where we propose to explicitly leverage the constraints imposed by the conjugate relationships between states and actions in a learned plannable latent space. We evaluate both procedure planning and walkthrough planning on large-scale real-world instructional videos. Our experiments show that we are able to learn plannable semantic representations without explicit supervision. This enables sequential reasoning on real-world videos and leads to stronger generalization compared to existing planning approaches and neural network policies.

READ FULL TEXT

page 2

page 7

page 8

research
03/27/2023

Ensemble Latent Space Roadmap for Improved Robustness in Visual Action Planning

Planning in learned latent spaces helps to decrease the dimensionality o...
research
08/17/2023

Event-Guided Procedure Planning from Instructional Videos with Text Supervision

In this work, we focus on the task of procedure planning from instructio...
research
10/05/2021

Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning

Learning new skills by observing humans' behaviors is an essential capab...
research
09/10/2021

PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

In this work, we study the problem of how to leverage instructional vide...
research
09/14/2023

Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos

A key challenge with procedure planning in instructional videos lies in ...
research
01/12/2018

Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution

We present a novel deep neural network architecture for representing rob...
research
05/09/2012

Seeing the Forest Despite the Trees: Large Scale Spatial-Temporal Decision Making

We introduce a challenging real-world planning problem where actions mus...

Please sign up or login with your details

Forgot password? Click here to reset