ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection

04/17/2023
by   Jeeseung Park, et al.
0

Human-Object Interaction (HOI) detection, which localizes and infers relationships between human and objects, plays an important role in scene understanding. Although two-stage HOI detectors have advantages of high efficiency in training and inference, they suffer from lower performance than one-stage methods due to the old backbone networks and the lack of considerations for the HOI perception process of humans in the interaction classifiers. In this paper, we propose Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO) to resolve these problems. First, we propose a novel feature extraction method suitable for the Vision Transformer backbone, called masking with overlapped area (MOA) module. The MOA module utilizes the overlapped area between each patch and the given region in the attention function, which addresses the quantization problem when using the Vision Transformer backbone. In addition, we design a graph with a pose-conditioned self-loop structure, which updates the human node encoding with local features of human joints. This allows the classifier to focus on specific human joints to effectively identify the type of interaction, which is motivated by the human perception process for HOI. As a result, ViPLO achieves the state-of-the-art results on two public benchmarks, especially obtaining a +2.07 mAP performance gain on the HICO-DET dataset. The source codes are available at https://github.com/Jeeseung-Park/ViPLO.

READ FULL TEXT

page 5

page 13

research
12/16/2021

QAHOI: Query-Based Anchors for Human-Object Interaction Detection

Human-object interaction (HOI) detection as a downstream of object detec...
research
06/13/2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Recent high-performing Human-Object Interaction (HOI) detection techniqu...
research
08/05/2020

Pose-based Modular Network for Human-Object Interaction Detection

Human-object interaction(HOI) detection is a critical task in scene unde...
research
08/10/2023

Double-chain Constraints for 3D Human Pose Estimation in Images and Videos

Reconstructing 3D poses from 2D poses lacking depth information is parti...
research
08/11/2021

Mining the Benefits of Two-stage and One-stage HOI Detection

Two-stage methods have dominated Human-Object Interaction (HOI) detectio...
research
03/15/2023

Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection

Human affective behavior analysis plays a vital role in human-computer i...
research
02/24/2022

Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

Affordance detection, which refers to perceiving objects with potential ...

Please sign up or login with your details

Forgot password? Click here to reset