TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

05/04/2022
by   Haodong Duan, et al.
5

Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for self-supervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative RecogTrans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporal-related downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method by 6.4 HMDB51 for action recognition (Top1 Acc); improves video retrieval on UCF101 by 20.4 exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.

READ FULL TEXT

page 1

page 5

page 9

page 14

research
02/20/2021

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations...
research
12/07/2021

Time-Equivariant Contrastive Video Representation Learning

We introduce a novel self-supervised contrastive learning method to lear...
research
11/25/2020

Can Temporal Information Help with Contrastive Self-Supervised Learning?

Leveraging temporal information has been regarded as essential for devel...
research
07/08/2021

Video 3D Sampling for Self-supervised Representation Learning

Most of the existing video self-supervised methods mainly leverage tempo...
research
01/20/2022

Self-supervised Video Representation Learning with Cascade Positive Retrieval

Self-supervised video representation learning has been shown to effectiv...
research
09/24/2022

Self-supervised Learning for Unintentional Action Prediction

Distinguishing if an action is performed as intended or if an intended a...
research
07/21/2020

Video Representation Learning by Recognizing Temporal Transformations

We introduce a novel self-supervised learning approach to learn represen...

Please sign up or login with your details

Forgot password? Click here to reset