Learning Asynchronous and Sparse Human-Object Interaction in Videos

03/03/2021
by   Romero Morais, et al.
0

Human activities can be learned from video. With effective modeling it is possible to discover not only the action labels but also the temporal structures of the activities such as the progression of the sub-activities. Automatically recognizing such structure from raw video signal is a new capability that promises authentic modeling and successful recognition of human-object interactions. Toward this goal, we introduce Asynchronous-Sparse Interaction Graph Networks (ASSIGN), a recurrent graph network that is able to automatically detect the structure of interaction events associated with entities in a video scene. ASSIGN pioneers learning of autonomous behavior of video entities including their dynamic structure and their interaction with the coexisting neighbors. Entities' lives in our model are asynchronous to those of others therefore more flexible in adaptation to complex scenarios. Their interactions are sparse in time hence more faithful to the true underlying nature and more robust in inference and learning. ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos. The native ability for discovering temporal structures of the model also eliminates the dependence on external segmentation that was previously mandatory for this task.

READ FULL TEXT

page 1

page 8

page 11

page 12

research
10/04/2012

Learning Human Activities and Object Affordances from RGB-D Videos

Understanding human activities and object affordances are two very impor...
research
05/13/2019

VideoGraph: Recognizing Minutes-Long Human Activities in Videos

Many human activities take minutes to unfold. To represent them, related...
research
08/04/2012

Human Activity Learning using Object Affordances from RGB-D Videos

Human activities comprise several sub-activities performed in a sequence...
research
12/11/2018

Grounded Human-Object Interaction Hotspots from Video

Learning how to interact with objects is an important step towards embod...
research
11/22/2017

Temporal Relational Reasoning in Videos

Temporal relational reasoning, the ability to link meaningful transforma...
research
09/29/2021

The Object at Hand: Automated Editing for Mixed Reality Video Guidance from Hand-Object Interactions

In this paper, we concern with the problem of how to automatically extra...
research
04/16/2020

Asynchronous Interaction Aggregation for Action Detection

Understanding interaction is an essential part of video action detection...

Please sign up or login with your details

Forgot password? Click here to reset