What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

10/16/2020
by   Lucas Taylor, et al.
0

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we explore a novel approach, where we use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. For this study, we collect a dataset of human interactions capturing body part movements and gaze in their daily lives. Our experiments show that our self-supervised representation that encodes interaction and attention cues outperforms a visual-only state-of-the-art method MoCo (He et al., 2020), on a variety of target tasks: scene classification (semantic), action recognition (temporal), depth estimation (geometric), dynamics prediction (physics) and walkable surface estimation (affordance).

READ FULL TEXT

page 1

page 3

page 13

research
11/03/2020

Learning Representations from Audio-Visual Spatial Alignment

We introduce a novel self-supervised pretext task for learning represent...
research
03/28/2018

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

We introduce the task of directly modeling a visually intelligent agent....
research
05/04/2020

VisualEchoes: Spatial Image Representation Learning through Echolocation

Several animal species (e.g., bats, dolphins, and whales) and even visua...
research
04/05/2016

The Curious Robot: Learning Visual Representations via Physical Interactions

What is the right supervisory signal to train visual representations? Cu...
research
08/13/2020

Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

When watching videos, the occurrence of a visual event is often accompan...
research
09/24/2022

Self-supervised Learning for Unintentional Action Prediction

Distinguishing if an action is performed as intended or if an intended a...
research
06/13/2022

Learning Task-Independent Game State Representations from Unlabeled Images

Self-supervised learning (SSL) techniques have been widely used to learn...

Please sign up or login with your details

Forgot password? Click here to reset