Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

10/07/2021
by   Shuang Li, et al.
11

We introduce the task of weakly supervised learning for detecting human and object interactions in videos. Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object. To address these challenges, we introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage temporal continuity of the visual appearance of moving objects as a form of self-supervision. To train our model, we introduce a dataset comprising over 6.5k videos with human-object interaction annotations that have been semi-automatically curated from sentence captions associated with the videos. We demonstrate improved performance over weakly supervised baselines adapted to our task on our video dataset.

READ FULL TEXT

page 1

page 8

page 10

research
11/10/2020

Detecting Human-Object Interaction with Mixed Supervision

Human object interaction (HOI) detection is an important task in image u...
research
12/11/2018

Grounded Human-Object Interaction Hotspots from Video

Learning how to interact with objects is an important step towards embod...
research
03/02/2023

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

Human object interaction (HOI) detection plays a crucial role in human-c...
research
04/14/2022

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

We consider the problem of detecting and recognizing the objects observe...
research
03/14/2017

Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

The complexity of a learning task is increased by transformations in the...
research
07/28/2021

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

The dominant paradigm in spatiotemporal action detection is to classify ...
research
07/30/2017

Discover and Learn New Objects from Documentaries

Despite the remarkable progress in recent years, detecting objects in a ...

Please sign up or login with your details

Forgot password? Click here to reset