Self-Supervised Learning of Video-Induced Visual Invariances

12/05/2019
by   Michael Tschannen, et al.
14

We propose a general framework for self-supervised learning of transferable visual representations based on video-induced visual invariances (VIVI). We consider the implicit hierarchy present in the videos and make use of (i) frame-level invariances (e.g. stability to color and contrast perturbations), (ii) shot/clip-level invariances (e.g. robustness to changes in object orientation and lighting conditions), and (iii) video-level invariances (semantic relationships of scenes across shots/clips), to define a holistic self-supervised loss. Training models using different variants of the proposed framework on videos from the YouTube-8M data set, we obtain state-of-the-art self-supervised transfer learning results on the 19 diverse downstream tasks of the Visual Task Adaptation Benchmark (VTAB), using only 1000 labels per task. We then show how to co-train our models jointly with labeled images, outperforming an ImageNet-pretrained ResNet-50 by 0.8 points with 10x fewer labeled images, as well as the previous best supervised model by 3.7 points using the full ImageNet data set.

READ FULL TEXT

page 5

page 6

page 11

research
01/13/2022

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

Despite recent progress made by self-supervised methods in representatio...
research
10/06/2020

Representation learning from videos in-the-wild: An object-centric approach

We propose a method to learn image representations from uncurated videos...
research
07/06/2022

Learning Invariant World State Representations with Predictive Coding

Self-supervised learning methods overcome the key bottleneck for buildin...
research
10/23/2022

Self-supervised Amodal Video Object Segmentation

Amodal perception requires inferring the full shape of an object that is...
research
12/10/2020

Concept Generalization in Visual Representation Learning

Measuring concept generalization, i.e., the extent to which models train...
research
04/06/2023

Self-Supervised Video Similarity Learning

We introduce S^2VS, a video similarity learning approach with self-super...
research
09/18/2022

The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning

Self-supervised learning (SSL) has emerged as a desirable paradigm in co...

Please sign up or login with your details

Forgot password? Click here to reset