Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

07/08/2018
by   Pascal Mettes, et al.
10

The goal of this paper is spatio-temporal localization of human actions from their class labels only. The state-of-the-art casts the problem as Multiple Instance Learning, where the instances are a priori computed action proposals. Rather than disconnecting the localization from the learning, we propose a variant of Multiple Instance Learning that integrates the spatio-temporal localization during the learning. We make three contributions. First, we define model assumptions tailored to actions and propose a latent instance learning objective allowing for optimization at the box-level. Second, we propose a spatio-temporal box linking algorithm, exploiting box proposals from off-the-shelf person detectors, suitable for weakly-supervised learning. Third, we introduce tube- and video-level refinements at inference time to integrate long-term spatio-temporal action characteristics. Our experiments on three video datasets show the benefits of our contributions as well as its competitive results compared to state-of-the-art alternatives that localize actions from their class label only. Finally, our algorithm enables incorporating point and box supervision, allowing to benchmark, mix, and balance action localization performance versus annotation time.

READ FULL TEXT

page 4

page 10

research
05/29/2018

Pointly-Supervised Action Localization

This paper strives for spatio-temporal localization of human actions in ...
research
04/05/2018

Guess Where? Actor-Supervision for Spatiotemporal Action Localization

This paper addresses the problem of spatiotemporal localization of actio...
research
04/26/2016

Spot On: Action Localization from Pointly-Supervised Proposals

We strive for spatio-temporal localization of actions in videos. The sta...
research
07/28/2017

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of a...
research
07/12/2016

Weakly Supervised Learning of Heterogeneous Concepts in Videos

Typical textual descriptions that accompany online videos are 'weak': i....
research
08/03/2023

A Survey on Deep Learning-based Spatio-temporal Action Detection

Spatio-temporal action detection (STAD) aims to classify the actions pre...
research
08/19/2015

Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

We are interested in solving the multiple measurement vector (MMV) probl...

Please sign up or login with your details

Forgot password? Click here to reset