Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

08/19/2021
by   Ning Wang, et al.
0

For a given video-based Human-Object Interaction scene, modeling the spatio-temporal relationship between humans and objects are the important cue to understand the contextual information presented in the video. With the effective spatio-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also to directly capture inter-time dependencies. It is more critical to capture the position changes of human and objects over the spatio-temporal dimension when their appearance features may not show up significant changes over time. The full use of appearance features, the spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance. In this paper, Spatio-Temporal Interaction Graph Parsing Networks (STIGPN) are constructed, which encode the videos with a graph composed of human and object nodes. These nodes are connected by two types of relations: (i) spatial relations modeling the interactions between human and the interacted objects within each frame. (ii) inter-time relations capturing the long range dependencies between human and the interacted objects across frame. With the graph, STIGPN learn spatio-temporal features directly from the whole video-based Human-Object Interaction scenes. Multi-modal features and a multi-stream fusion strategy are used to enhance the reasoning capability of STIGPN. Two Human-Object Interaction video datasets, including CAD-120 and Something-Else, are used to evaluate the proposed architectures, and the state-of-the-art performance demonstrates the superiority of STIGPN.

READ FULL TEXT

page 1

page 4

page 8

research
06/07/2022

Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object Interaction detection

The key of Human-Object Interaction(HOI) recognition is to infer the rel...
research
06/05/2018

Videos as Space-Time Region Graphs

How do humans recognize the action "opening a book" ? We argue that ther...
research
10/15/2019

Being the center of attention: A Person-Context CNN framework for Personality Recognition

This paper proposes a novel study on personality recognition using video...
research
05/18/2021

Non-contact Pain Recognition from Video Sequences with Remote Physiological Measurements Prediction

Automatic pain recognition is paramount for medical diagnosis and treatm...
research
08/28/2019

Explainable Video Action Reasoning via Prior Knowledge and State Transitions

Human action analysis and understanding in videos is an important and ch...
research
12/11/2020

Spatio-attentive Graphs for Human-Object Interaction Detection

We address the problem of detecting human–object interactions in images ...
research
07/29/2019

Seeing Things in Random-Dot Videos

The human visual system correctly groups features and interprets videos ...

Please sign up or login with your details

Forgot password? Click here to reset