A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

05/02/2020
by   Frank F. Xu, et al.
6

Procedural knowledge, which we define as concrete information about the sequence of actions that go into performing a particular procedure, plays an important role in understanding real-world tasks and actions. Humans often learn this knowledge from instructional text and video, and in this paper we aim to perform automatic extraction of this knowledge in a similar way. As a concrete step in this direction, we propose the new task of inferring procedures in a structured form(a data structure containing verbs and arguments) from multimodal instructional video contents and their corresponding transcripts. We first create a manually annotated, large evaluation dataset including over350 instructional cooking videos along with over 15,000 English sentences in transcripts spanning over 89 recipes. We conduct analysis of the challenges posed by this task and dataset with experiments with unsupervised segmentation, semantic role labeling, and visual action detection based baselines. The dataset and code will be publicly available at https://github.com/frankxu2004/cooking-procedural-extraction.

READ FULL TEXT

page 1

page 3

page 4

research
04/02/2021

Visual Semantic Role Labeling for Video Understanding

We propose a new framework for understanding and representing related sa...
research
09/06/2021

WhyAct: Identifying Action Reasons in Lifestyle Vlogs

We aim to automatically identify human action reasons in online videos. ...
research
09/12/2023

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

We introduce the task of automatic human action co-occurrence identifica...
research
08/03/2010

Fully automatic extraction of salient objects from videos in near real-time

Automatic video segmentation plays an important role in a wide range of ...
research
04/18/2019

Creative Procedural-Knowledge Extraction From Web Design Tutorials

Complex design tasks often require performing diverse actions in a speci...
research
09/28/2022

Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Humour is a substantial element of human affect and cognition. Its autom...
research
03/08/2022

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

A long-standing goal of intelligent assistants such as AR glasses/robots...

Please sign up or login with your details

Forgot password? Click here to reset