Human-Object Interaction Detection via Weak Supervision

12/01/2021
by   Mert Kilickaya, et al.
1

The goal of this paper is Human-object Interaction (HO-I) detection. HO-I detection aims to find interacting human-objects regions and classify their interaction from an image. Researchers obtain significant improvement in recent years by relying on strong HO-I alignment supervision from [5]. HO-I alignment supervision pairs humans with their interacted objects, and then aligns human-object pair(s) with their interaction categories. Since collecting such annotation is expensive, in this paper, we propose to detect HO-I without alignment supervision. We instead rely on image-level supervision that only enumerates existing interactions within the image without pointing where they happen. Our paper makes three contributions: i) We propose Align-Former, a visual-transformer based CNN that can detect HO-I with only image-level supervision. ii) Align-Former is equipped with HO-I align layer, that can learn to select appropriate targets to allow detector supervision. iii) We evaluate Align-Former on HICO-DET [5] and V-COCO [13], and show that Align-Former outperforms existing image-level supervised HO-I detectors by a large margin (4.71

READ FULL TEXT

page 2

page 3

page 9

page 10

research
08/20/2023

HODN: Disentangling Human-Object Feature for HOI Detection

The task of Human-Object Interaction (HOI) detection is to detect humans...
research
03/09/2023

Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors

Human-object interaction (HOI) detection aims to extract interacting hum...
research
01/13/2020

Classifying All Interacting Pairs in a Single Shot

In this paper, we introduce a novel human interaction detection approach...
research
04/12/2022

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

In this paper, we study the challenging instance-wise vision-language ta...
research
07/07/2022

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Existing open-vocabulary object detectors typically enlarge their vocabu...
research
10/02/2020

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Recent years, human-object interaction (HOI) detection has achieved impr...
research
05/27/2023

Self-Supervised Learning of Action Affordances as Interaction Modes

When humans perform a task with an articulated object, they interact wit...

Please sign up or login with your details

Forgot password? Click here to reset