Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking

10/28/2019
by   Alaaeldin El-Nouby, et al.
12

Deep neural networks require collecting and annotating large amounts of data to train successfully. In order to alleviate the annotation bottleneck, we propose a novel self-supervised representation learning approach for spatiotemporal features extracted from videos. We introduce Skip-Clip, a method that utilizes temporal coherence in videos, by training a deep model for future clip order ranking conditioned on a context clip as a surrogate objective for video future prediction. We show that features learned using our method are generalizable and transfer strongly to downstream tasks. For action recognition on the UCF101 dataset, we obtain 51.8 and outperform models initialized using inflated ImageNet parameters. Skip-Clip also achieves results competitive with state-of-the-art self-supervision methods.

READ FULL TEXT

page 1

page 7

research
11/21/2016

Self-Supervised Video Representation Learning With Odd-One-Out Networks

We propose a new self-supervised CNN pre-training technique based on a n...
research
12/11/2021

Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

Recent self-supervised video representation learning methods have found ...
research
04/08/2022

Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning

Recent self-supervised video representation learning methods focus on ma...
research
09/24/2022

Self-supervised Learning for Unintentional Action Prediction

Distinguishing if an action is performed as intended or if an intended a...
research
07/30/2018

Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning

Self-supervised learning of convolutional neural networks can harness la...
research
05/28/2019

Greedy InfoMax for Biologically Plausible Self-Supervised Representation Learning

We propose a novel deep learning method for local self-supervised repres...
research
11/12/2018

A Perceptual Prediction Framework for Self Supervised Event Segmentation

Temporal segmentation of long videos is an important problem, that has l...

Please sign up or login with your details

Forgot password? Click here to reset