Towards Generalisable Audio Representations for Audio-Visual Navigation

06/01/2022
by   Shunqi Mao, et al.
0

In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments based on its audio and visual perceptions. While existing methods attempt to improve the navigation performance with preciously designed path planning or intricate task settings, none has improved the model generalisation on unheard sounds with task settings unchanged. We thus propose a contrastive learning-based method to tackle this challenge by regularising the audio encoder, where the sound-agnostic goal-driven latent representations can be learnt from various audio signals of different classes. In addition, we consider two data augmentation strategies to enrich the training sounds. We demonstrate that our designs can be easily equipped to existing AVN frameworks to obtain an immediate performance gain (13.4 12.2 https://AV-GeN.github.io/.

READ FULL TEXT
research
04/21/2023

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Visual-audio navigation (VAN) is attracting more and more attention from...
research
11/29/2021

Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds

Audio-visual navigation combines sight and hearing to navigate to a soun...
research
08/20/2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Audio-visual navigation is an audio-targeted wayfinding task where a rob...
research
08/01/2023

Multi-goal Audio-visual Navigation using Sound Direction Map

Over the past few years, there has been a great deal of research on navi...
research
05/23/2023

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

Respiratory sound contains crucial information for the early diagnosis o...
research
02/22/2022

Sound Adversarial Audio-Visual Navigation

Audio-visual navigation task requires an agent to find a sound source in...
research
06/06/2023

Active Sparse Conversations for Improved Audio-Visual Embodied Navigation

Efficient navigation towards an audio-goal necessitates an embodied agen...

Please sign up or login with your details

Forgot password? Click here to reset