Procedure Planning in Instructional Videos

07/02/2019
by   Chien-Yi Chang, et al.
0

We propose a new challenging task: procedure planning in instructional videos. Unlike existing planning problems, where both the state and the action spaces are well-defined, the key challenge of planning in instructional videos is that both the state and the action spaces are open-vocabulary. We address this challenge with latent space planning, where we propose to explicitly leverage the constraints imposed by the conjugate relationships between states and actions in a learned plannable latent space. We evaluate both procedure planning and walkthrough planning on large-scale real-world instructional videos. Our experiments show that we are able to learn plannable semantic representations without explicit supervision. This enables sequential reasoning on real-world videos and leads to stronger generalization compared to existing planning approaches and neural network policies.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset