Weakly Supervised Action Selection Learning in Video

05/06/2021
by   Junwei Ma, et al.
0

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 10.3 further analyze the properties of ASL and demonstrate the importance of actionness. Full code for this work is available here: https://github.com/layer6ai-labs/ASL.

READ FULL TEXT

page 1

page 8

research
11/22/2019

Background Suppression Network for Weakly-supervised Temporal Action Localization

Weakly-supervised temporal action localization is a very challenging pro...
research
06/12/2020

Background Modeling via Uncertainty Estimation for Weakly-supervised Action Localization

Weakly-supervised temporal action localization aims to detect intervals ...
research
05/17/2018

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

Video learning is an important task in computer vision and has experienc...
research
03/27/2020

Weakly-Supervised Action Localization by Generative Attention Modeling

Weakly-supervised temporal action localization is a problem of learning ...
research
05/12/2022

Weakly-Supervised Action Detection Guided by Audio Narration

Videos are more well-organized curated data sources for visual concept l...
research
01/03/2021

A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization

Weakly supervised temporal action localization is a challenging vision t...
research
08/25/2022

Enabling Weakly-Supervised Temporal Action Localization from On-Device Learning of the Video Stream

Detecting actions in videos have been widely applied in on-device applic...

Please sign up or login with your details

Forgot password? Click here to reset