Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection

04/11/2022
by   Jihwan Park, et al.
1

Human-Object Interaction detection is a holistic visual recognition task that entails object detection as well as interaction classification. Previous works of HOI detection has been addressed by the various compositions of subset predictions, e.g., Image -> HO -> I, Image -> HI -> O. Recently, transformer based architecture for HOI has emerged, which directly predicts the HOI triplets in an end-to-end fashion (Image -> HOI). Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths. CPC learning enforces all the possible predictions from permuted inference sequences to be consistent. This simple scheme makes the model learn consistent representations, thereby improving generalization without increasing model capacity. Our experiments demonstrate the effectiveness of our method, and we achieved significant improvement on V-COCO and HICO-DET compared to the baseline models. Our code is available at https://github.com/mlvlab/CPChoi.

READ FULL TEXT
research
03/08/2021

End-to-End Human Object Interaction Detection with HOI Transformer

We propose HOI Transformer to tackle human object interaction (HOI) dete...
research
09/15/2021

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Recently, DETR pioneered the solution of vision tasks with transformers,...
research
05/06/2022

Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection

Dark environment becomes a challenge for computer vision algorithms owin...
research
12/12/2022

NMS Strikes Back

Detection Transformer (DETR) directly transforms queries to unique objec...
research
06/07/2020

Robust Learning Through Cross-Task Consistency

Visual perception entails solving a wide set of tasks, e.g., object dete...
research
05/12/2023

RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection

With the publication of DINO, a variant of the Detection Transformer (DE...
research
05/05/2021

Visual Composite Set Detection Using Part-and-Sum Transformers

Computer vision applications such as visual relationship detection and h...

Please sign up or login with your details

Forgot password? Click here to reset