Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

03/31/2022
by   Junyu Gao, et al.
0

We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training. Despite the recent progress, existing methods mainly embrace a localization-by-classification paradigm and overlook the fruitful fine-grained temporal distinctions between video sequences, thus suffering from severe ambiguity in classification learning and classification-to-localization adaption. This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. Specifically, under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting, where the first one considers the relations of various action/background proposals by using match, insert, and delete operators and the second one mines the longest common subsequences between two videos. Both contrasting modules can enhance each other and jointly enjoy the merits of discriminative action-background separation and alleviated task gap between classification and localization. Extensive experiments show that our method achieves state-of-the-art performance on two popular benchmarks. Our code is available at https://github.com/MengyuanChen21/CVPR2022-FTCL.

READ FULL TEXT

page 2

page 4

research
11/22/2019

Background Suppression Network for Weakly-supervised Temporal Action Localization

Weakly-supervised temporal action localization is a very challenging pro...
research
06/17/2022

CDNet: Contrastive Disentangled Network for Fine-Grained Image Categorization of Ocular B-Scan Ultrasound

Precise and rapid categorization of images in the B-scan ultrasound moda...
research
08/24/2023

Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization

Weakly supervised temporal action localization (WSTAL) aims to localize ...
research
08/03/2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

Understanding human emotions is a crucial ability for intelligent robots...
research
07/10/2021

Weakly-Supervised Classification and Detection of Bird Sounds in the Wild. A BirdCLEF 2021 Solution

It is easier to hear birds than see them, however, they still play an es...
research
07/31/2023

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

Weakly-supervised temporal action localization (WTAL) is a practical yet...
research
08/11/2021

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

We tackle the problem of localizing temporal intervals of actions with o...

Please sign up or login with your details

Forgot password? Click here to reset