Active Audio-Visual Separation of Dynamic Sound Sources

02/02/2022
by   Sagnik Majumder, et al.
1

We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an object of interest. The agent hears a mixed stream of multiple time-varying audio sources (e.g., multiple people conversing and a band playing music at a noisy party). Given a limited time budget, it needs to extract the target sound using egocentric audio-visual observations. We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone to recover the dynamic target audio, improving its own estimates for past timesteps via self-attention. Using highly realistic acoustic SoundSpaces simulations in real-world scanned Matterport3D environments, we show that our model is able to learn efficient behavior to carry out continuous separation of a time-varying audio target. Project: https://vision.cs.utexas.edu/projects/active-av-dynamic-separation/.

READ FULL TEXT

page 1

page 4

page 8

page 14

research
05/15/2021

Move2Hear: Active Audio-Visual Source Separation

We introduce the active audio-visual source separation problem, where an...
research
12/24/2019

Audio-Visual Embodied Navigation

Moving around in the world is naturally a multisensory experience, but t...
research
11/29/2021

Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds

Audio-visual navigation combines sight and hearing to navigate to a soun...
research
02/22/2022

Sound Adversarial Audio-Visual Navigation

Audio-visual navigation task requires an agent to find a sound source in...
research
06/17/2021

Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

We introduce a state-of-the-art audio-visual on-screen sound separation ...
research
06/08/2022

Few-Shot Audio-Visual Learning of Environment Acoustics

Room impulse response (RIR) functions capture how the surrounding physic...
research
06/16/2020

Acoustic prediction of flowrate: varying liquid jet stream onto a free surface

Information on liquid jet stream flow is crucial in many real world appl...

Please sign up or login with your details

Forgot password? Click here to reset