Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity

11/25/2019
by   Miao Liu, et al.
6

We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals critical information about the future activity. Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using latent variables in our deep model. The predicted motor attention is further used to characterise the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. At the time of submission, our method is ranked first on unseen test set during EPIC-Kitchens Action Anticipation Challenge Phase 2.

READ FULL TEXT

page 1

page 3

page 8

page 11

page 13

page 14

research
05/31/2020

In the Eye of the Beholder: Gaze and Actions in First Person Video

We address the task of jointly determining what a person is doing and wh...
research
09/12/2022

Graphing the Future: Activity and Next Active Object Prediction using Graph-based Activity Representations

We present a novel approach for the visual prediction of human-object in...
research
12/16/2019

Predicting the Future: A Jointly Learnt Model for Action Anticipation

Inspired by human neurological structures for action anticipation, we pr...
research
02/07/2023

Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

Object affordance is an important concept in hand-object interaction, pr...
research
05/26/2017

Predicting Human Interaction via Relative Attention Model

Predicting human interaction is challenging as the on-going activity has...
research
03/03/2017

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

We design a new approach that allows robot learning of new activities fr...
research
06/11/2022

Precise Affordance Annotation for Egocentric Action Video Datasets

Object affordance is an important concept in human-object interaction, p...

Please sign up or login with your details

Forgot password? Click here to reset