End-to-End Human Object Interaction Detection with HOI Transformer

03/08/2021
by   Cheng Zou, et al.
0

We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. Current approaches either decouple HOI task into separated stages of object detection and interaction classification or introduce surrogate interaction problem. In contrast, our method, named HOI Transformer, streamlines the HOI pipeline by eliminating the need for many hand-designed components. HOI Transformer reasons about the relations of objects and humans from global image context and directly predicts HOI instances in parallel. A quintuple matching loss is introduced to force HOI predictions in a unified way. Our method is conceptually much simpler and demonstrates improved accuracy. Without bells and whistles, HOI Transformer achieves 26.61% AP on HICO-DET and 52.9% AP_role on V-COCO, surpassing previous methods with the advantage of being much simpler. We hope our approach will serve as a simple and effective alternative for HOI tasks. Code is available at https://github.com/bbepoch/HoiTransformer .

READ FULL TEXT

page 1

page 7

page 8

research
05/26/2020

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set pred...
research
04/11/2022

Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection

Human-Object Interaction detection is a holistic visual recognition task...
research
12/16/2021

QAHOI: Query-Based Anchors for Human-Object Interaction Detection

Human-object interaction (HOI) detection as a downstream of object detec...
research
06/13/2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Recent high-performing Human-Object Interaction (HOI) detection techniqu...
research
05/16/2022

CONSENT: Context Sensitive Transformer for Bold Words Classification

We present CONSENT, a simple yet effective CONtext SENsitive Transformer...
research
04/10/2022

Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition

Human fashion understanding is one important computer vision task since ...
research
11/28/2022

Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries

We address 2D floorplan reconstruction from 3D scans. Existing approache...

Please sign up or login with your details

Forgot password? Click here to reset