Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

01/12/2022
by   Abdelrahman Younes, et al.
0

Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds. We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds. We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps to encode the features needed to learn a robust navigation policy for our new complex task settings. We demonstrate that our approach outperforms the current state-of-the-art with better generalization to unheard sounds and better robustness to noisy scenarios on the two challenging 3D scanned real-world datasets Replica and Matterport3D, for the static and dynamic audio-visual navigation benchmarks. Our novel benchmark will be made available at http://dav-nav.cs.uni-freiburg.de.

READ FULL TEXT
research
11/29/2021

Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds

Audio-visual navigation combines sight and hearing to navigate to a soun...
research
12/24/2019

Audio-Visual Embodied Navigation

Moving around in the world is naturally a multisensory experience, but t...
research
08/21/2020

Learning to Set Waypoints for Audio-Visual Navigation

In audio-visual navigation, an agent intelligently travels through a com...
research
02/22/2022

Sound Adversarial Audio-Visual Navigation

Audio-visual navigation task requires an agent to find a sound source in...
research
10/04/2022

Pay Self-Attention to Audio-Visual Navigation

Audio-visual embodied navigation, as a hot research topic, aims training...
research
03/30/2019

Static Visual Spatial Priors for DoA Estimation

As we interact with the world, for example when we communicate with our ...
research
04/21/2023

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Visual-audio navigation (VAN) is attracting more and more attention from...

Please sign up or login with your details

Forgot password? Click here to reset