Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds

11/29/2021
by   Abdelrahman Younes, et al.
0

Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment. While recent approaches have demonstrated the benefits of audio input to detect and find the goal, they focus on clean and static sound sources and struggle to generalize to unheard sounds. In this work, we propose the novel dynamic audio-visual navigation benchmark which requires to catch a moving sound source in an environment with noisy and distracting sounds. We introduce a reinforcement learning approach that learns a robust navigation policy for these complex settings. To achieve this, we propose an architecture that fuses audio-visual information in the spatial feature space to learn correlations of geometric information inherent in both local maps and audio signals. We demonstrate that our approach consistently outperforms the current state-of-the-art by a large margin across all tasks of moving sounds, unheard sounds, and noisy environments, on two challenging 3D scanned real-world environments, namely Matterport3D and Replica. The benchmark is available at http://dav-nav.cs.uni-freiburg.de.

READ FULL TEXT

page 1

page 8

page 12

research
01/12/2022

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

Recent work on audio-visual navigation targets a single static sound in ...
research
08/01/2023

Multi-goal Audio-visual Navigation using Sound Direction Map

Over the past few years, there has been a great deal of research on navi...
research
02/22/2022

Sound Adversarial Audio-Visual Navigation

Audio-visual navigation task requires an agent to find a sound source in...
research
06/01/2022

Towards Generalisable Audio Representations for Audio-Visual Navigation

In audio-visual navigation (AVN), an intelligent agent needs to navigate...
research
03/30/2019

Static Visual Spatial Priors for DoA Estimation

As we interact with the world, for example when we communicate with our ...
research
10/04/2022

Pay Self-Attention to Audio-Visual Navigation

Audio-visual embodied navigation, as a hot research topic, aims training...
research
02/02/2022

Active Audio-Visual Separation of Dynamic Sound Sources

We explore active audio-visual separation for dynamic sound sources, whe...

Please sign up or login with your details

Forgot password? Click here to reset