From Lifestyle Vlogs to Everyday Interactions

12/06/2017
by   David F. Fouhey, et al.
0

A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start with a large collection of interaction-rich video data and then annotate and analyze it. We use Internet Lifestyle Vlogs as the source of surprisingly large and diverse interaction data. We show that by collecting the data first, we are able to achieve greater scale and far greater diversity in terms of actions and actors. Additionally, our data exposes biases built into common explicitly gathered data. We make sense of our data by analyzing the central component of interaction -- hands. We benchmark two tasks: identifying semantic object contact at the video level and non-semantic contact state at the frame level. We additionally demonstrate future prediction of hands.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 6

page 8

research
06/11/2020

Understanding Human Hands in Contact at Internet Scale

Hands are the central means by which humans manipulate their world and b...
research
10/19/2021

Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

Every hand-object interaction begins with contact. Despite predicting th...
research
05/05/2020

Adaptive Interaction Modeling via Graph Operations Search

Interaction modeling is important for video action analysis. Recently, s...
research
09/12/2022

Graphing the Future: Activity and Next Active Object Prediction using Graph-based Activity Representations

We present a novel approach for the visual prediction of human-object in...
research
02/01/2021

Forecasting Action through Contact Representations from First Person Video

Human actions involving hand manipulations are structured according to t...
research
07/28/2016

SEMBED: Semantic Embedding of Egocentric Action Videos

We present SEMBED, an approach for embedding an egocentric object intera...

Please sign up or login with your details

Forgot password? Click here to reset