Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

12/03/2021
by   Frederic Z. Zhang, et al.
5

Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

READ FULL TEXT

page 1

page 7

page 8

page 11

page 12

page 13

research
06/13/2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Recent high-performing Human-Object Interaction (HOI) detection techniqu...
research
05/05/2021

Visual Composite Set Detection Using Part-and-Sum Transformers

Computer vision applications such as visual relationship detection and h...
research
08/11/2021

Mining the Benefits of Two-stage and One-stage HOI Detection

Two-stage methods have dominated Human-Object Interaction (HOI) detectio...
research
09/22/2021

T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

6D pose estimation is the task of predicting the translation and orienta...
research
03/20/2022

Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows

This paper presents a new vision Transformer, named Iwin Transformer, wh...
research
04/11/2023

Relational Context Learning for Human-Object Interaction Detection

Recent state-of-the-art methods for HOI detection typically build on tra...
research
06/10/2020

Condensing Two-stage Detection with Automatic Object Key Part Discovery

Modern two-stage object detectors generally require excessively large mo...

Please sign up or login with your details

Forgot password? Click here to reset