LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video. It can be thought of as a specialized version of Visual Relationship Detection, wherein one of the objects must be a human. While traditional methods formulate the problem as inference on a sequence of video segments, we present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture spatio-temporal cues at multiple granularities in a video. Unlike current approaches, LIGHTEN avoids using ground truth data like depth maps or 3D human pose, thus increasing generalization across non-RGBD datasets as well. Furthermore, we achieve the same using only the visual features, instead of the commonly used hand-crafted spatial features. We achieve state-of-the-art results in human-object interaction detection (88.9 92.6 HOI detection in V-COCO dataset, setting a new benchmark for visual features based approaches. Code for LIGHTEN is available at https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI

READ FULL TEXT

page 1

page 4

page 7

page 8

research
11/14/2022

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Spatio-temporal Human-Object Interaction (ST-HOI) detection aims at dete...
research
03/11/2020

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Comprehensive visual understanding requires detection frameworks that ca...
research
04/07/2019

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-an...
research
05/15/2020

Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching

We propose in this paper an architecture for near-duplicate video detect...
research
07/01/2021

Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations

Electronic Theses and Dissertations (ETDs) contain domain knowledge that...
research
12/11/2020

Spatio-attentive Graphs for Human-Object Interaction Detection

We address the problem of detecting human–object interactions in images ...
research
04/05/2019

Detecting Human-Object Interactions via Functional Generalization

We present an approach for detecting human-object interactions (HOIs) in...

Please sign up or login with your details

Forgot password? Click here to reset