Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

01/02/2020
by   Dezhao Luo, et al.
7

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates "blanks" by withholding video clips and then creates "options" by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with "options" and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

READ FULL TEXT

page 1

page 7

research
09/10/2019

Video Representation Learning by Dense Predictive Coding

The objective of this paper is self-supervised learning of spatio-tempor...
research
08/06/2020

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Existing video self-supervised learning methods mainly rely on trimmed v...
research
11/23/2020

Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning

We present a novel way for self-supervised video representation learning...
research
06/20/2020

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

In self-supervised spatio-temporal representation learning, the temporal...
research
03/05/2020

Self-Supervised Spatio-Temporal Representation Learning Using Variable Playback Speed Prediction

We propose a self-supervised learning method by predicting the variable ...
research
04/10/2022

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

Learning an egocentric action recognition model from video data is chall...
research
12/14/2020

Aggregative Self-Supervised Feature Learning

Self-supervised learning (SSL) is an efficient approach that addresses t...

Please sign up or login with your details

Forgot password? Click here to reset