Towards Automatic Learning of Procedures from Web Instructional Videos

03/28/2017
by   Luowei Zhou, et al.
0

The potential for agents, whether embodied or software, to learn by observing other agents performing procedures involving objects and actions is rich. Current research on automatic procedure learning heavily relies on action labels or video subtitles, even during the evaluation phase, which makes them infeasible in real-world scenarios. This leads to our question: can the human-consensus structure of a procedure be learned from a large set of long, unconstrained videos (e.g., instructional videos from YouTube) with only visual evidence? To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments. Given that no large-scale dataset is available for this problem, we collect a large-scale procedure segmentation dataset with procedure segments temporally localized and described; we use cooking videos and name the dataset YouCook2. We propose a segment-level recurrent network for generating procedure segments by modeling the dependencies across segments. The generated segments can be used as pre-processing for other tasks, such as dense video captioning and event parsing. We show in our experiments that the proposed model outperforms competitive baselines in procedure segmentation.

READ FULL TEXT

page 8

page 12

research
12/02/2016

Unsupervised Human Action Detection by Action Matching

We propose a new task of unsupervised action detection by action matchin...
research
01/04/2021

Semantic Video Segmentation for Intracytoplasmic Sperm Injection Procedures

We present the first deep learning model for the analysis of intracytopl...
research
04/04/2015

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

We address the problem of fine-grained action localization from temporal...
research
01/04/2018

SmartTennisTV: Automatic indexing of tennis videos

In this paper, we demonstrate a score based indexing approach for tennis...
research
05/20/2022

Action parsing using context features

We propose an action parsing algorithm to parse a video sequence contain...
research
08/30/2019

Generating Persuasive Visual Storylines for Promotional Videos

Video contents have become a critical tool for promoting products in E-c...
research
09/30/2022

A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos

Understanding the steps required to perform a task is an important skill...

Please sign up or login with your details

Forgot password? Click here to reset