Can Temporal Information Help with Contrastive Self-Supervised Learning?

11/25/2020
by   Yutong Bai, et al.
10

Leveraging temporal information has been regarded as essential for developing video understanding models. However, how to properly incorporate temporal information into the recent successful instance discrimination based contrastive self-supervised learning (CSL) framework remains unclear. As an intuitive solution, we find that directly applying temporal augmentations does not help, or even impair video CSL in general. This counter-intuitive observation motivates us to re-design existing video CSL frameworks, for better integration of temporal knowledge. To this end, we present Temporal-aware Contrastive self-supervised learningTaCo, as a general paradigm to enhance video CSL. Specifically, TaCo selects a set of temporal transformations not only as strong data augmentation but also to constitute extra self-supervision for video understanding. By jointly contrasting instances with enriched temporal transformations and learning these transformations as self-supervised signals, TaCo can significantly enhance unsupervised video representation learning. For instance, TaCo demonstrates consistent improvement in downstream classification tasks over a list of backbones and CSL approaches. Our best model achieves 85.1 (UCF-101) and 51.6 improvement over the previous state-of-the-art.

READ FULL TEXT

page 3

page 7

research
12/07/2021

Time-Equivariant Contrastive Video Representation Learning

We introduce a novel self-supervised contrastive learning method to lear...
research
04/01/2021

Composable Augmentation Encoding for Video Representation Learning

We focus on contrastive methods for self-supervised video representation...
research
02/10/2022

Using Navigational Information to Learn Visual Representations

Children learn to build a visual representation of the world from unsupe...
research
06/16/2022

iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

Learning visual representations through self-supervision is an extremely...
research
08/24/2023

Motion-Guided Masking for Spatiotemporal Representation Learning

Several recent works have directly extended the image masked autoencoder...
research
05/04/2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

Recognizing transformation types applied to a video clip (RecogTrans) is...
research
10/21/2019

Self-supervised classification of dynamic obstacles using the temporal information provided by videos

Nowadays, autonomous driving systems can detect, segment, and classify t...

Please sign up or login with your details

Forgot password? Click here to reset