Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

07/17/2023
by   Kumar Ashutosh, et al.
0

Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state – such as the steps of a recipe or a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a predefined sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, and then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional videos, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.

READ FULL TEXT

page 2

page 4

page 7

research
12/05/2019

Zero-Shot Generation of Human-Object Interaction Videos

Generation of videos of complex scenes is an important open problem in c...
research
12/06/2018

Zero-Shot Anticipation for Instructional Activities

How can we teach a robot to predict what will happen next for an activit...
research
02/17/2023

Multimodal Subtask Graph Generation from Instructional Videos

Real-world tasks consist of multiple inter-dependent subtasks (e.g., a d...
research
07/10/2018

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

Our goal is for a robot to execute a previously unseen task based on a s...
research
02/17/2023

Unsupervised Task Graph Generation from Instructional Video Transcripts

This work explores the problem of generating task graphs of real-world a...
research
11/24/2022

Multi-Task Learning of Object State Changes from Uncurated Videos

We aim to learn to temporally localize object state changes and the corr...
research
01/14/2020

EGO-TOPO: Environment Affordances from Egocentric Video

First-person video naturally brings the use of a physical environment to...

Please sign up or login with your details

Forgot password? Click here to reset