Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization

08/24/2023
by   Songchun Zhang, et al.
0

Weakly supervised temporal action localization (WSTAL) aims to localize actions in untrimmed videos using video-level labels. Despite recent advances, existing approaches mainly follow a localization-by-classification pipeline, generally processing each segment individually, thereby exploiting only limited contextual information. As a result, the model will lack a comprehensive understanding (e.g. appearance and temporal structure) of various action patterns, leading to ambiguity in classification learning and temporal localization. Our work addresses this from a novel perspective, by exploring and exploiting the cross-video contextual knowledge within the dataset to recover the dataset-level semantic structure of action instances via weak labels only, thereby indirectly improving the holistic understanding of fine-grained action patterns and alleviating the aforementioned ambiguities. Specifically, an end-to-end framework is proposed, including a Robust Memory-Guided Contrastive Learning (RMGCL) module and a Global Knowledge Summarization and Aggregation (GKSA) module. First, the RMGCL module explores the contrast and consistency of cross-video action features, assisting in learning more structured and compact embedding space, thus reducing ambiguity in classification learning. Further, the GKSA module is used to efficiently summarize and propagate the cross-video representative action knowledge in a learnable manner to promote holistic action patterns understanding, which in turn allows the generation of high-confidence pseudo-labels for self-learning, thus alleviating ambiguity in temporal localization. Extensive experiments on THUMOS14, ActivityNet1.3, and FineAction demonstrate that our method outperforms the state-of-the-art methods, and can be easily plugged into other WSTAL methods.

READ FULL TEXT

page 1

page 4

page 10

research
01/21/2020

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Temporal action localization is an important step towards video understa...
research
03/31/2022

Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

We target at the task of weakly-supervised action localization (WSAL), w...
research
03/22/2023

Weakly-Supervised Temporal Action Localization by Inferring Snippet-Feature Affinity

Weakly-supervised temporal action localization aims to locate action reg...
research
08/19/2023

Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling

Weakly-supervised action localization aims to recognize and localize act...
research
03/30/2021

CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

Weakly-supervised temporal action localization (WS-TAL) aims to localize...
research
04/25/2023

Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint

Weakly Supervised Temporal Action Localization (WTAL) aims to classify a...
research
08/18/2020

Equivalent Classification Mapping for Weakly Supervised Temporal Action Localization

Weakly supervised temporal action localization is a newly emerging yet w...

Please sign up or login with your details

Forgot password? Click here to reset