Unsupervised Learning of Visual Representations using Videos

05/04/2015
by   Xiaolong Wang, et al.
0

Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision. That is, two patches connected by a track should have similar visual representation in deep feature space since they probably belong to the same object or object part. We design a Siamese-triplet network with a ranking loss function to train this CNN representation. Without using a single image from ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train an ensemble of unsupervised networks that achieves 52 regression). This performance comes tantalizingly close to its ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4 also show that our unsupervised network can perform competitively in other tasks such as surface-normal estimation.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
06/01/2018

A Classification approach towards Unsupervised Learning of Visual Representations

In this paper, we present a technique for unsupervised learning of visua...
research
12/01/2016

Object-Centric Representation Learning from Unlabeled Videos

Supervised (pre-)training currently yields state-of-the-art performance ...
research
08/09/2017

Transitive Invariance for Self-supervised Visual Representation Learning

Learning visual representations with self-supervised learning has become...
research
03/30/2016

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

In this paper we study the problem of image representation learning with...
research
06/07/2022

Online Deep Clustering with Video Track Consistency

Several unsupervised and self-supervised approaches have been developed ...
research
04/18/2017

Unsupervised Learning by Predicting Noise

Convolutional neural networks provide visual features that perform remar...
research
03/18/2020

Watching the World Go By: Representation Learning from Unlabeled Videos

Recent single image unsupervised representation learning techniques show...

Please sign up or login with your details

Forgot password? Click here to reset