Exploring Relations in Untrimmed Videos for Self-Supervised Learning

08/06/2020
by   Dezhao Luo, et al.
0

Existing video self-supervised learning methods mainly rely on trimmed videos for model training. However, trimmed datasets are manually annotated from untrimmed videos. In this sense, these methods are not really self-supervised. In this paper, we propose a novel self-supervised method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos (real unlabeled) to learn spatio-temporal features. ERUV first generates single-shot videos by shot change detection. Then a designed sampling strategy is used to model relations for video clips. The strategy is saved as our self-supervision signals. Finally, the network learns representations by predicting the category of relations between the video clips. ERUV is able to compare the differences and similarities of videos, which is also an essential procedure for action and video related tasks. We validate our learned models with action recognition and video retrieval tasks with three kinds of 3D CNNs. Experimental results show that ERUV is able to learn richer representations and it outperforms state-of-the-art self-supervised methods with significant margins.

READ FULL TEXT

page 1

page 3

page 5

page 8

research
01/02/2020

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

We propose a novel self-supervised method, referred to as Video Cloze Pr...
research
07/08/2021

Video 3D Sampling for Self-supervised Representation Learning

Most of the existing video self-supervised methods mainly leverage tempo...
research
12/13/2019

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

Annotating videos is cumbersome, expensive and not scalable. Yet, many s...
research
11/28/2018

Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations

To alleviate the expensive cost of data collection and annotation, many ...
research
06/25/2022

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

Self-supervised methods have significantly closed the gap with end-to-en...
research
06/13/2020

DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition

State-of-the-art video action recognition models with complex network ar...
research
08/05/2020

Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

Self-supervised learning has shown great potentials in improving the dee...

Please sign up or login with your details

Forgot password? Click here to reset