Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

03/31/2023
by   Yiwu Zhong, et al.
0

The abundance of instructional videos and their narrations over the Internet offers an exciting avenue for understanding procedural activities. In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations. Our method jointly learns a video representation to encode individual step concepts, and a deep probabilistic model to capture both temporal dependencies and immense individual variations in the step ordering. We empirically demonstrate that learning temporal ordering not only enables new capabilities for procedure reasoning, but also reinforces the recognition of individual steps. Our model significantly advances the state-of-the-art results on step classification (+2.8 (+7.4 inference for step classification and forecasting, as well as in predicting diverse and plausible steps for incomplete procedures. Our code is available at https://github.com/facebookresearch/ProcedureVRL.

READ FULL TEXT

page 3

page 8

research
03/31/2023

Procedure-Aware Pretraining for Instructional Video Understanding

Our goal is to learn a video representation that is useful for downstrea...
research
01/26/2022

Learning To Recognize Procedural Activities with Distant Supervision

In this paper we consider the problem of classifying fine-grained, multi...
research
04/05/2020

Deep Multimodal Feature Encoding for Video Ordering

True understanding of videos comes from a joint analysis of all its moda...
research
03/07/2019

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

There are substantial instructional videos on the Internet, which enable...
research
03/20/2020

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation

Thanks to the substantial and explosively inscreased instructional video...
research
03/23/2023

Learning and Verification of Task Structure in Instructional Videos

Given the enormous number of instructional videos available online, lear...
research
05/27/2023

Non-Sequential Graph Script Induction via Multimedia Grounding

Online resources such as WikiHow compile a wide range of scripts for per...

Please sign up or login with your details

Forgot password? Click here to reset