Shuffle and Learn: Unsupervised Learning using Temporal Order Verification

03/28/2016
by   Ishan Misra, et al.
0

In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful visual representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives significant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. To demonstrate its sensitivity to human pose, we show results for pose estimation on the FLIC and MPII datasets that are competitive, or better than approaches using significantly more supervision. Our method can be combined with supervised representations to provide an additional boost in accuracy.

READ FULL TEXT

page 3

page 4

page 8

page 10

page 19

page 20

research
08/03/2017

Unsupervised Representation Learning by Sorting Sequences

We present an unsupervised representation learning approach using videos...
research
08/03/2020

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

A steady momentum of innovations and breakthroughs has convincingly push...
research
07/30/2018

Markerless Visual Robot Programming by Demonstration

In this paper we present an approach for learning to imitate human behav...
research
01/06/2020

Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations

The goal of many computer vision systems is to transform image pixels in...
research
08/07/2017

Self-supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos

Human pose analysis is presently dominated by deep convolutional network...
research
09/23/2016

Real-time Human Pose Estimation from Video with Convolutional Neural Networks

In this paper, we present a method for real-time multi-person human pose...
research
12/01/2016

Object-Centric Representation Learning from Unlabeled Videos

Supervised (pre-)training currently yields state-of-the-art performance ...

Please sign up or login with your details

Forgot password? Click here to reset