Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

03/20/2023
by   Ziyang Chen, et al.
0

The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of images, while an audio model predicts the direction of sound sources from binaural sounds. We train these models to generate predictions that agree with one another. At test time, the models can be deployed independently. To obtain a feature representation that is well-suited to solving this challenging problem, we also propose a method for learning an audio-visual representation through cross-view binauralization: estimating binaural sound from one view, given images and sound from another. Our model can successfully estimate accurate rotations on both real and synthetic scenes, and localize sound sources with accuracy competitive with state-of-the-art self-supervised approaches. Project site: https://ificl.github.io/SLfM/

READ FULL TEXT

page 1

page 3

page 6

page 7

research
07/13/2020

Multiple Sound Sources Localization from Coarse to Fine

How to visually localize multiple sound sources in unconstrained videos ...
research
11/28/2022

Mix and Localize: Localizing Sound Sources in Mixtures

We present a method for simultaneously localizing multiple sound sources...
research
03/28/2023

Egocentric Auditory Attention Localization in Conversations

In a noisy conversation environment such as a dinner party, people often...
research
04/26/2022

Sound Localization by Self-Supervised Time Delay Estimation

Sounds reach one microphone in a stereo pair sooner than the other, resu...
research
04/11/2019

The Sound of Motions

Sounds originate from object motions and vibrations of surrounding air. ...
research
08/09/2023

Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Self-supervised sound source localization is usually challenged by the m...
research
11/20/2019

Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications

Visual events are usually accompanied by sounds in our daily lives. Howe...

Please sign up or login with your details

Forgot password? Click here to reset