The Curious Robot: Learning Visual Representations via Physical Interactions

04/05/2016
by   Lerrel Pinto, et al.
0

What is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation learning does not require millions of semantic labels. We argue that biological agents use physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web). For example, babies push objects, poke them, put them in their mouth and throw them to learn representations. Towards this goal, we build one of the first systems on a Baxter platform that pushes, pokes, grasps and observes objects in a tabletop environment. It uses four different types of physical interactions to collect more than 130K datapoints, with each datapoint providing supervision to a shared ConvNet architecture allowing us to learn visual representations. We show the quality of learned representations by observing neuron activations and performing nearest neighbor retrieval on this learned representation. Quantitatively, we evaluate our learned ConvNet on image classification tasks and show improvements compared to learning without external data. Finally, on the task of instance retrieval, our network outperforms the ImageNet network on recall@1 by 3

READ FULL TEXT

page 2

page 5

page 6

page 7

page 8

page 12

page 13

research
08/09/2017

Transitive Invariance for Self-supervised Visual Representation Learning

Learning visual representations with self-supervised learning has become...
research
05/03/2021

Curious Representation Learning for Embodied Intelligence

Self-supervised representation learning has achieved remarkable success ...
research
10/16/2020

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

Learning effective representations of visual data that generalize to a v...
research
05/30/2023

A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play

Research in child development has shown that embodied experience handlin...
research
02/11/2021

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Pre-trained representations are becoming crucial for many NLP and percep...
research
03/28/2018

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

We introduce the task of directly modeling a visually intelligent agent....
research
05/10/2018

Towards an Unequivocal Representation of Actions

This work introduces verb-only representations for actions and interacti...

Please sign up or login with your details

Forgot password? Click here to reset