Structure from Silence: Learning Scene Structure from Ambient Sound

11/10/2021
by   Ziyang Chen, et al.
0

From whirling ceiling fans to ticking clocks, the sounds that we hear subtly vary as we move through a scene. We ask whether these ambient sounds convey information about 3D scene structure and, if so, whether they provide a useful learning signal for multimodal models. To study this, we collect a dataset of paired audio and RGB-D recordings from a variety of quiet indoor scenes. We then train models that estimate the distance to nearby walls, given only audio as input. We also use these recordings to learn multimodal representations through self-supervision, by training a network to associate images with their corresponding sounds. These results suggest that ambient sound conveys a surprising amount of information about scene structure, and that it is a useful signal for learning multimodal features.

READ FULL TEXT

page 1

page 3

page 5

research
05/18/2020

Cross-Task Transfer for Multimodal Aerial Scene Recognition

Aerial scene recognition is a fundamental task in remote sensing and has...
research
05/11/2020

Foreground-Background Ambient Sound Scene Separation

Ambient sound scenes typically comprise multiple short events occurring ...
research
06/05/2022

Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction

A geometrically-motivated method for primary-ambient decomposition is pr...
research
12/20/2017

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

The sound of crashing waves, the roar of fast-moving cars -- sound conve...
research
08/25/2016

Ambient Sound Provides Supervision for Visual Learning

The sound of crashing waves, the roar of fast-moving cars -- sound conve...
research
05/10/2022

Learning Visual Styles from Audio-Visual Associations

From the patter of rain to the crunch of snow, the sounds we hear often ...
research
07/14/2022

Egocentric Scene Understanding via Multimodal Spatial Rectifier

In this paper, we study a problem of egocentric scene understanding, i.e...

Please sign up or login with your details

Forgot password? Click here to reset