HOTR: End-to-End Human-Object Interaction Detection with Transformers

04/28/2021
by   Bumsoo Kim, et al.
0

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred to by HOTR, which directly predicts a set of <human, object, interaction> triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.

READ FULL TEXT

page 3

page 4

research
08/20/2023

HODN: Disentangling Human-Object Feature for HOI Detection

The task of Human-Object Interaction (HOI) detection is to detect humans...
research
04/20/2022

Human-Object Interaction Detection via Disentangled Transformer

Human-Object Interaction Detection tackles the problem of joint localiza...
research
03/28/2022

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is the task of identifying a se...
research
04/11/2023

Relational Context Learning for Human-Object Interaction Detection

Recent state-of-the-art methods for HOI detection typically build on tra...
research
05/05/2021

Visual Composite Set Detection Using Part-and-Sum Transformers

Computer vision applications such as visual relationship detection and h...
research
02/24/2022

Effective Actor-centric Human-object Interaction Detection

While Human-Object Interaction(HOI) Detection has achieved tremendous ad...
research
07/20/2022

HTNet: Anchor-free Temporal Action Localization with Hierarchical Transformers

Temporal action localization (TAL) is a task of identifying a set of act...

Please sign up or login with your details

Forgot password? Click here to reset