VisualEchoes: Spatial Image Representation Learning through Echolocation

05/04/2020
by   Ruohan Gao, et al.
3

Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world. We explore the spatial cues contained in echoes and how they can benefit vision tasks that require spatial reasoning. First we capture echo responses in photo-realistic 3D indoor scene environments. Then we propose a novel interaction-based representation learning framework that learns useful visual features via echolocation. We show that the learned image features are useful for multiple downstream vision tasks requiring spatial reasoning—monocular depth estimation, surface normal estimation, and visual navigation. Our work opens a new path for representation learning for embodied agents, where supervision comes from interacting with the physical world. Our experiments demonstrate that our image features learned from echoes are comparable or even outperform heavily supervised pre-training methods for multiple fundamental spatial tasks.

READ FULL TEXT

page 5

page 8

page 14

page 16

research
07/23/2023

Learning Navigational Visual Representations with Semantic Map Supervision

Being able to perceive the semantics and the spatial structure of the en...
research
10/16/2020

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

Learning effective representations of visual data that generalize to a v...
research
05/18/2021

Pathdreamer: A World Model for Indoor Navigation

People navigating in unfamiliar buildings take advantage of myriad visua...
research
05/03/2021

Curious Representation Learning for Embodied Intelligence

Self-supervised representation learning has achieved remarkable success ...
research
03/16/2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

In computer vision, pre-training models based on largescale supervised l...
research
03/13/2019

Neural Scene Decomposition for Multi-Person Motion Capture

Learning general image representations has proven key to the success of ...
research
11/25/2020

Unsupervised Object Keypoint Learning using Local Spatial Predictability

We propose PermaKey, a novel approach to representation learning based o...

Please sign up or login with your details

Forgot password? Click here to reset