Focusing on what to decode and what to train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising

07/05/2023
by   Junwen Chen, et al.
0

Recent one-stage transformer-based methods achieve notable gains in the Human-object Interaction Detection (HOI) task by leveraging the detection of DETR. However, the current methods redirect the detection target of the object decoder, and the box target is not explicitly separated from the query embeddings, which leads to long and hard training. Furthermore, matching the predicted HOI instances with the ground-truth is more challenging than object detection, simply adapting training strategies from the object detection makes the training more difficult. To clear the ambiguity between human and object detection and share the prediction burden, we propose a novel one-stage framework (SOV), which consists of a subject decoder, an object decoder, and a verb decoder. Moreover, we propose a novel Specific Target Guided (STG) DeNoising strategy, which leverages learnable object and verb label embeddings to guide the training and accelerates the training convergence. In addition, for the inference part, the label-specific information is directly fed into the decoders by initializing the query embeddings from the learnable label embeddings. Without additional features or prior language knowledge, our method (SOV-STG) achieves higher accuracy than the state-of-the-art method in one-third of training epochs. The code is available at <https://github.com/cjw2021/SOV-STG>.

READ FULL TEXT

page 2

page 4

research
08/08/2023

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

False negatives (FN) in 3D object detection, e.g., missing predictions o...
research
06/13/2022

Featurized Query R-CNN

The query mechanism introduced in the DETR method is changing the paradi...
research
03/26/2022

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

The task of Human-Object Interaction (HOI) detection could be divided in...
research
07/24/2023

Exposing the Troublemakers in Described Object Detection

Detecting objects based on language descriptions is a popular task that ...
research
02/11/2023

Metaphor Detection with Effective Context Denoising

We propose a novel RoBERTa-based model, RoPPT, which introduces a target...
research
07/12/2022

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is a core task for high-level i...
research
03/11/2023

Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays

Due to the necessity for precise treatment planning, the use of panorami...

Please sign up or login with your details

Forgot password? Click here to reset