Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting

01/21/2021
by   Sovan Biswas, et al.
0

Since collecting and annotating data for spatio-temporal action detection is very expensive, there is a need to learn approaches with less supervision. Weakly supervised approaches do not require any bounding box annotations and can be trained only from labels that indicate whether an action occurs in a video clip. Current approaches, however, cannot handle the case when there are multiple persons in a video that perform multiple actions at the same time. In this work, we address this very challenging task for the first time. We propose a baseline based on multi-instance and multi-label learning. Furthermore, we propose a novel approach that uses sets of actions as representation instead of modeling individual action classes. Since computing, the probabilities for the full power set becomes intractable as the number of action classes increases, we assign an action set to each detected person under the constraint that the assignment is consistent with the annotation of the video clip. We evaluate the proposed approach on the challenging AVA dataset where the proposed approach outperforms the MIML baseline and is competitive to fully supervised approaches.

READ FULL TEXT
research
07/04/2014

Weakly Supervised Action Labeling in Videos Under Ordering Constraints

We are given a set of video clips, each one annotated with an ordered l...
research
06/02/2017

Temporal Action Labeling using Action Sets

Action detection and temporal segmentation of actions in videos are topi...
research
02/09/2020

Weakly-Supervised Multi-Person Action Recognition in 360^∘ Videos

The recent development of commodity 360^∘ cameras have enabled a single ...
research
05/12/2022

Weakly-Supervised Action Detection Guided by Audio Narration

Videos are more well-organized curated data sources for visual concept l...
research
05/26/2020

Minimizing Supervision in Multi-label Categorization

Multiple categories of objects are present in most images. Treating this...
research
07/28/2017

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of a...
research
12/03/2017

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Despite the availability of a huge amount of video data accompanied by d...

Please sign up or login with your details

Forgot password? Click here to reset