Deep Motion Prior for Weakly-Supervised Temporal Action Localization

08/12/2021
by   Meng Cao, et al.
0

Weakly-Supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos with only video-level labels. Currently, most state-of-the-art WSTAL methods follow a Multi-Instance Learning (MIL) pipeline: producing snippet-level predictions first and then aggregating to the video-level prediction. However, we argue that existing methods have overlooked two important drawbacks: 1) inadequate use of motion information and 2) the incompatibility of prevailing cross-entropy training loss. In this paper, we analyze that the motion cues behind the optical flow features are complementary informative. Inspired by this, we propose to build a context-dependent motion prior, termed as motionness. Specifically, a motion graph is introduced to model motionness based on the local motion carrier (e.g., optical flow). In addition, to highlight more informative video snippets, a motion-guided loss is proposed to modulate the network training conditioned on motionness scores. Extensive ablation studies confirm that motionness efficaciously models action-of-interest, and the motion-guided loss leads to more accurate results. Besides, our motion-guided loss is a plug-and-play loss function and is applicable with existing WSTAL methods. Without loss of generality, based on the standard MIL pipeline, our method achieves new state-of-the-art performance on three challenging benchmarks, including THUMOS'14, ActivityNet v1.2 and v1.3.

READ FULL TEXT

page 1

page 8

page 9

research
01/21/2020

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Temporal action localization is an important step towards video understa...
research
03/30/2021

CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

Weakly-supervised temporal action localization (WS-TAL) aims to localize...
research
11/21/2022

Slow Motion Matters: A Slow Motion Enhanced Network for Weakly Supervised Temporal Action Localization

Weakly supervised temporal action localization (WTAL) aims to localize a...
research
05/29/2023

Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization

Weakly-supervised temporal action localization aims to localize and reco...
research
08/18/2020

Equivalent Classification Mapping for Weakly Supervised Temporal Action Localization

Weakly supervised temporal action localization is a newly emerging yet w...
research
08/21/2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding

Masked autoencoding has shown excellent performance on self-supervised v...
research
08/27/2020

Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

As a vital topic in media content interpretation, video anomaly detectio...

Please sign up or login with your details

Forgot password? Click here to reset