Learning to Set Waypoints for Audio-Visual Navigation

08/21/2020
by   Changan Chen, et al.
0

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements: 1) waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Both new ideas capitalize on the synergy of audio and visual data for revealing the geometry of an unmapped space. We demonstrate our approach on two challenging datasets of real-world 3D scenes, Replica and Matterport3D. Our model improves the state of the art by a substantial margin, and our experiments reveal that learning the links between sights, sounds, and space is essential for audio-visual navigation.

READ FULL TEXT
research
01/12/2022

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

Recent work on audio-visual navigation targets a single static sound in ...
research
12/25/2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial aspect of mobile intelligent agents is their ability to integr...
research
08/20/2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Audio-visual navigation is an audio-targeted wayfinding task where a rob...
research
12/21/2020

Semantic Audio-Visual Navigation

Recent work on audio-visual navigation assumes a constantly-sounding tar...
research
04/21/2023

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Visual-audio navigation (VAN) is attracting more and more attention from...
research
02/22/2022

Sound Adversarial Audio-Visual Navigation

Audio-visual navigation task requires an agent to find a sound source in...
research
06/06/2023

Active Sparse Conversations for Improved Audio-Visual Embodied Navigation

Efficient navigation towards an audio-goal necessitates an embodied agen...

Please sign up or login with your details

Forgot password? Click here to reset