Self-supervised Video Representation Learning by Pace Prediction

08/13/2020
by   Jiangliu Wang, et al.
0

This paper addresses the problem of self-supervised video representation learning from a new perspective – by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a widely used technique in film making. Specifically, given a video played in natural pace, we randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip. The assumption here is that the network can only succeed in such a pace reasoning task when it understands the underlying video content and learns representative spatio-temporal features. In addition, we further introduce contrastive learning to push the model towards discriminating different paces by maximizing the agreement on similar video content. To validate the effectiveness of the proposed method, we conduct extensive experiments on action recognition and video retrieval tasks with several alternative network architectures. Experimental evaluations show that our approach achieves state-of-the-art performance for self-supervised video representation learning across different network architectures and different benchmarks. The code and pre-trained models are available at <https://github.com/laura-wang/video-pace>.

READ FULL TEXT

page 2

page 13

page 22

research
05/26/2022

Cross-Architecture Self-supervised Video Representation Learning

In this paper, we present a new cross-architecture contrastive learning ...
research
09/26/2021

Self-Supervised Video Representation Learning by Video Incoherence Detection

This paper introduces a novel self-supervised method that leverages inco...
research
08/31/2020

Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

This paper proposes a novel pretext task to address the self-supervised ...
research
05/10/2023

Self-Supervised Video Representation Learning via Latent Time Navigation

Self-supervised video representation learning aimed at maximizing simila...
research
01/20/2022

Self-supervised Video Representation Learning with Cascade Positive Retrieval

Self-supervised video representation learning has been shown to effectiv...
research
09/03/2021

Revisiting 3D ResNets for Video Recognition

A recent work from Bello shows that training and scaling strategies may ...
research
11/30/2022

Spatio-Temporal Crop Aggregation for Video Representation Learning

We propose Spatio-temporal Crop Aggregation for video representation LEa...

Please sign up or login with your details

Forgot password? Click here to reset