Self-Supervised Spatio-Temporal Representation Learning Using Variable Playback Speed Prediction

03/05/2020
by   Hyeon Cho, et al.
0

We propose a self-supervised learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal representation of the video by leveraging the variations in the visual appearance according to different playback speeds under the assumption of temporal coherence. To learn the spatio-temporal variations in the entire video, we have not only predicted a single playback speed but also generated clips of various playback speeds with randomized starting points. We then train a 3D convolutional network by solving the formulation that sorts the shuffled clips by their playback speed. In this case, the playback speed includes both forward and reverse directions; hence the visual representation can be successfully learned from the directional dynamics of the video. We also propose a novel layer-dependable temporal group normalization method that can be applied to 3D convolutional networks to improve the representation learning performance where we divide the temporal features into several groups and normalize each one using the different corresponding parameters. We validate the effectiveness of the proposed method by fine-tuning it to the action recognition task. The experimental results show that the proposed method outperforms state-of-the-art self-supervised learning methods in action recognition.

READ FULL TEXT

page 3

page 12

research
06/16/2018

Two Stream Self-Supervised Learning for Action Recognition

We present a self-supervised approach using spatio-temporal signals betw...
research
10/14/2020

Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning

In this paper we show that learning video feature spaces in which tempor...
research
01/02/2020

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

We propose a novel self-supervised method, referred to as Video Cloze Pr...
research
08/05/2020

Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

Self-supervised learning has shown great potentials in improving the dee...
research
09/26/2021

Self-Supervised Video Representation Learning by Video Incoherence Detection

This paper introduces a novel self-supervised method that leverages inco...
research
04/10/2022

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

Learning an egocentric action recognition model from video data is chall...
research
07/27/2020

Representation Learning with Video Deep InfoMax

Self-supervised learning has made unsupervised pretraining relevant agai...

Please sign up or login with your details

Forgot password? Click here to reset