Few-Shot Audio-Visual Learning of Environment Acoustics

06/08/2022
by   Sagnik Majumder, et al.
8

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed in the space. Towards that goal, we introduce a transformer-based method that uses self-attention to build a rich acoustic context, then predicts RIRs of arbitrary query source-receiver locations through cross-attention. Additionally, we design a novel training objective that improves the match in the acoustic signature between the RIR predictions and the targets. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs, outperforming state-of-the-art methods and–in a major departure from traditional methods–generalizing to novel environments in a few-shot manner. Project: http://vision.cs.utexas.edu/projects/fs_rir.

READ FULL TEXT

page 2

page 4

page 9

page 15

research
02/14/2022

Visual Acoustic Matching

We introduce the visual acoustic matching task, in which an audio clip i...
research
06/14/2021

Learning Audio-Visual Dereverberation

Reverberation from audio reflecting off surfaces and objects in the envi...
research
07/27/2023

Self-Supervised Visual Acoustic Matching

Acoustic matching aims to re-synthesize an audio clip to sound as if it ...
research
01/20/2023

Novel-View Acoustic Synthesis

We introduce the novel-view acoustic synthesis (NVAS) task: given the si...
research
02/02/2022

Active Audio-Visual Separation of Dynamic Sound Sources

We explore active audio-visual separation for dynamic sound sources, whe...
research
06/16/2022

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based a...
research
03/30/2019

Static Visual Spatial Priors for DoA Estimation

As we interact with the world, for example when we communicate with our ...

Please sign up or login with your details

Forgot password? Click here to reset