Weakly-supervised Action Localization via Hierarchical Mining

06/22/2022
by   Jia-Chang Feng, et al.
0

Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels. Thus, the crucial issue of existing weakly-supervised action localization methods is the limited supervision from the weak annotations for precise predictions. In this work, we propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining, to maximize the usage of the given annotations and prediction-wise consistency. To this end, a Hierarchical Mining Network (HiM-Net) is proposed. Concretely, it mines hierarchical supervision for classification in two grains: one is the video-level existence for ground truth categories captured by multiple instance learning; the other is the snippet-level inexistence for each negative-labeled category from the perspective of complementary labels, which is optimized by our proposed complementary label learning. As for hierarchical consistency, HiM-Net explores video-level co-action feature similarity and snippet-level foreground-background opposition, for discriminative representation learning and consistent foreground-background separation. Specifically, prediction variance is viewed as uncertainty to select the pairs with high consensus for proposed foreground-background collaborative learning. Comprehensive experimental results show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency. Code will be available on GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2021

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

The object of Weakly-supervised Temporal Action Localization (WS-TAL) is...
research
06/20/2021

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to rec...
research
05/01/2022

Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization

In weakly-supervised temporal action localization (WS-TAL), the methods ...
research
08/14/2021

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

As a challenging task of high-level video understanding, weakly supervis...
research
08/22/2019

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

Temporal action localization is a challenging computer vision problem wi...
research
05/04/2023

Weakly-supervised Micro- and Macro-expression Spotting Based on Multi-level Consistency

Most micro- and macro-expression spotting methods in untrimmed videos su...
research
12/19/2022

Distilling Vision-Language Pre-training to Collaborate with Weakly-Supervised Temporal Action Localization

Weakly-supervised temporal action localization (WTAL) learns to detect a...

Please sign up or login with your details

Forgot password? Click here to reset