Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations

11/28/2018
by   Longlong Jing, et al.
0

To alleviate the expensive cost of data collection and annotation, many self-supervised learning methods were proposed to learn image representations without human-labeled annotations. However, self-supervised learning for video representations is not yet well-addressed. In this paper, we propose a novel 3DConvNet-based fully self-supervised framework to learn spatiotemporal video features without using any human-labeled annotations. First, a set of pre-designed geometric transformations (e.g. rotating 0 degree, 90 degrees, 180 degrees, and 270 degrees) are applied to each video. Then a pretext task can be defined as "recognizing the pre-designed geometric transformations." Therefore, the spatiotemporal video features can be learned in the process of accomplishing this pretext task without using human-labeled annotations. The learned spatiotemporal video representations can further be employed as pretrained features for different video-related applications. The proposed geometric transformations (e.g. rotations) are proved to be effective to learn representative spatiotemporal features in our 3DConvNet-based fully self-supervised framework. With the pre-trained spatiotemporal features from two large video datasets, the performance of action recognition is significantly boosted up by 20.4 respectively compared to that from the model trained from scratch. Furthermore, our framework outperforms the state-of-the-arts of fully self-supervised methods on both UCF101 and HMDB51 datasets and achieves 62.9 accuracy respectively.

READ FULL TEXT

page 1

page 4

page 6

page 9

page 11

page 12

research
08/06/2020

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Existing video self-supervised learning methods mainly rely on trimmed v...
research
12/02/2021

Self-supervised Video Transformer

In this paper, we propose self-supervised training for video transformer...
research
02/20/2021

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations...
research
01/26/2019

DistInit: Learning Video Representations without a Single Labeled Video

Video recognition models have progressed significantly over the past few...
research
03/21/2022

Towards Self-Supervised Gaze Estimation

Recent joint embedding-based self-supervised methods have surpassed stan...
research
10/28/2019

Self-supervised learning of class embeddings from video

This work explores how to use self-supervised learning on videos to lear...
research
07/21/2020

Video Representation Learning by Recognizing Temporal Transformations

We introduce a novel self-supervised learning approach to learn represen...

Please sign up or login with your details

Forgot password? Click here to reset