Forecasting Action through Contact Representations from First Person Video

02/01/2021
by   Eadom Dessalene, et al.
1

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs (Ego-OMG), a framework for action anticipation and prediction. Ego-OMG models longer term temporal semantic relations through the use of a graph modeling transitions between contact delineated action states. Use of the Anticipation Module within Ego-OMG produces state-of-the-art results, achieving 1st and 2nd place on the unseen and seen test sets, respectively, of the EPIC Kitchens Action Anticipation Challenge, and achieving state-of-the-art results on the tasks of action anticipation and action prediction over EPIC Kitchens. We perform ablation studies over characteristics of the Anticipation Module to evaluate their utility.

READ FULL TEXT

page 1

page 4

page 5

page 10

page 12

research
06/05/2020

Egocentric Object Manipulation Graphs

We introduce Egocentric Object Manipulation Graphs (Ego-OMG) - a novel r...
research
10/03/2016

Prediction of Manipulation Actions

Looking at a person's hands one often can tell what the person is going ...
research
04/06/2023

Therbligs in Action: Video Understanding through Motion Primitives

In this paper we introduce a rule-based, compositional, and hierarchical...
research
04/10/2021

Object Priors for Classifying and Localizing Unseen Actions

This work strives for the classification and localization of human actio...
research
09/17/2023

CaSAR: Contact-aware Skeletal Action Recognition

Skeletal Action recognition from an egocentric view is important for app...
research
12/06/2017

From Lifestyle Vlogs to Everyday Interactions

A major stumbling block to progress in understanding basic human interac...
research
06/06/2021

Transformed ROIs for Capturing Visual Transformations in Videos

Modeling the visual changes that an action brings to a scene is critical...

Please sign up or login with your details

Forgot password? Click here to reset