PDPP:Projected Diffusion for Procedure Planning in Instructional Videos

03/26/2023
by   Hanlin Wang, et al.
0

In this paper, we study the problem of procedure planning in instructional videos, which aims to make goal-directed plans given the current visual observations in unstructured real-life videos. Previous works cast this problem as a sequence planning problem and leverage either heavy intermediate visual observations or natural language instructions as supervision, resulting in complex learning schemes and expensive annotation costs. In contrast, we treat this problem as a distribution fitting problem. In this sense, we model the whole intermediate action sequence distribution with a diffusion model (PDPP), and thus transform the planning problem to a sampling process from this distribution. In addition, we remove the expensive intermediate supervision, and simply use task labels from instructional videos as supervision instead. Our model is a U-Net based diffusion model, which directly samples action sequences from the learned distribution with the given start and end observations. Furthermore, we apply an efficient projection method to provide accurate conditional guides for our model during the learning and sampling process. Experiments on three datasets with different scales show that our PDPP model can achieve the state-of-the-art performance on multiple metrics, even without the task supervision. Code and trained models are available at https://github.com/MCG-NJU/PDPP.

READ FULL TEXT

page 3

page 15

research
05/04/2022

P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

In this paper, we study the problem of procedure planning in instruction...
research
09/14/2023

Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos

A key challenge with procedure planning in instructional videos lies in ...
research
08/17/2023

Event-Guided Procedure Planning from Instructional Videos with Text Supervision

In this work, we focus on the task of procedure planning from instructio...
research
09/10/2021

PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

In this work, we study the problem of how to leverage instructional vide...
research
10/05/2021

Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning

Learning new skills by observing humans' behaviors is an essential capab...
research
01/02/2023

Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data

In this paper, we learn a diffusion model to generate 3D data on a scene...
research
04/06/2022

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

The field of Natural Language Processing (NLP) has experienced a dramati...

Please sign up or login with your details

Forgot password? Click here to reset