WhyAct: Identifying Action Reasons in Lifestyle Vlogs

by   Oana Ignat, et al.
University of Michigan

We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.


page 2

page 7

page 13

page 14

page 15

page 16


Identifying Visible Actions in Lifestyle Vlogs

We consider the task of identifying human actions visible in online vide...

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

We introduce the task of automatic human action co-occurrence identifica...

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

Procedural knowledge, which we define as concrete information about the ...

Online Action Detection

In online action detection, the goal is to detect the start of an action...

Learning Action Changes by Measuring Verb-Adverb Textual Relationships

The goal of this work is to understand the way actions are performed in ...

Video Caption Dataset for Describing Human Actions in Japanese

In recent years, automatic video caption generation has attracted consid...

Describing Common Human Visual Actions in Images

Which common human actions and interactions are recognizable in monocula...

Code Repositories


Identifying reasons for human actions in lifestyle vlogs.

view repo

Please sign up or login with your details

Forgot password? Click here to reset