Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object Interaction detection

06/07/2022
by   Hongsheng Li, et al.
0

The key of Human-Object Interaction(HOI) recognition is to infer the relationship between human and objects. Recently, the image's Human-Object Interaction(HOI) detection has made significant progress. However, there is still room for improvement in video HOI detection performance. Existing one-stage methods use well-designed end-to-end networks to detect a video segment and directly predict an interaction. It makes the model learning and further optimization of the network more complex. This paper introduces the Spatial Parsing and Dynamic Temporal Pooling (SPDTP) network, which takes the entire video as a spatio-temporal graph with human and object nodes as input. Unlike existing methods, our proposed network predicts the difference between interactive and non-interactive pairs through explicit spatial parsing, and then performs interaction recognition. Moreover, we propose a learnable and differentiable Dynamic Temporal Module(DTM) to emphasize the keyframes of the video and suppress the redundant frame. Furthermore, the experimental results show that SPDTP can pay more attention to active human-object pairs and valid keyframes. Overall, we achieve state-of-the-art performance on CAD-120 dataset and Something-Else dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
08/19/2021

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

For a given video-based Human-Object Interaction scene, modeling the spa...
research
05/07/2021

Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior

Human-Object Interaction (HOI) detection aims to detect visual relations...
research
07/02/2023

Human-to-Human Interaction Detection

A comprehensive understanding of interested human-to-human interactions ...
research
04/30/2021

RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

Human-Object Interaction (HOI) detection devotes to learn how humans int...
research
08/22/2018

Deep Adaptive Temporal Pooling for Activity Recognition

Deep neural networks have recently achieved competitive accuracy for hum...
research
04/16/2022

Interactiveness Field in Human-Object Interactions

Human-Object Interaction (HOI) detection plays a core role in activity u...
research
10/27/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

For video recognition task, a global representation summarizing the whol...

Please sign up or login with your details

Forgot password? Click here to reset