Transitive Invariance for Self-supervised Visual Representation Learning

08/09/2017
by   Xiaolong Wang, et al.
0

Learning visual representations with self-supervised learning has become popular in computer vision. The idea is to design auxiliary tasks where labels are free to obtain. Most of these tasks end up providing data to learn specific kinds of invariance useful for recognition. In this paper, we propose to exploit different self-supervised approaches to learn representations invariant to (i) inter-instance variations (two objects in the same class should have similar features) and (ii) intra-instance variations (viewpoint, pose, deformations, illumination, etc). Instead of combining two approaches with multi-task learning, we argue to organize and reason the data with multiple variations. Specifically, we propose to generate a graph with millions of objects mined from hundreds of thousands of videos. The objects are connected by two types of edges which correspond to two types of invariance: "different instances but a similar viewpoint and category" and "different viewpoints of the same instance". By applying simple transitivity on the graph with these edges, we can obtain pairs of images exhibiting richer visual invariance. We use this data to train a Triplet-Siamese network with VGG16 as the base architecture and apply the learned representations to different recognition tasks. For object detection, we achieve 63.2 R-CNN (compare to 67.3 dataset, our method is surprisingly close (23.5 counterpart (24.4 network can perform significantly better than the ImageNet network in the surface normal estimation task.

READ FULL TEXT

page 1

page 4

page 5

page 6

research
05/04/2015

Unsupervised Learning of Visual Representations using Videos

Is strong supervision necessary for learning a good visual representatio...
research
04/05/2016

The Curious Robot: Learning Visual Representations via Physical Interactions

What is the right supervisory signal to train visual representations? Cu...
research
08/08/2022

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Recently, Masked Image Modeling (MIM) achieves great success in self-sup...
research
11/05/2022

Local Manifold Augmentation for Multiview Semantic Consistency

Multiview self-supervised representation learning roots in exploring sem...
research
10/04/2021

Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

Humans can identify objects following various spatial transformations su...
research
05/30/2023

A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play

Research in child development has shown that embodied experience handlin...
research
02/28/2022

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

Maintaining proper form while exercising is important for preventing inj...

Please sign up or login with your details

Forgot password? Click here to reset