Static Visual Spatial Priors for DoA Estimation

03/30/2019
by   Pawel Swietojanski, et al.
0

As we interact with the world, for example when we communicate with our colleagues in a large open space or meeting room, we continuously analyse the surrounding environment and, in particular, localise and recognise acoustic events. While we largely take such abilities for granted, they represent a challenging problem for current robots or smart voice assistants as they can be easily fooled by high degree of sound interference in acoustically complex environments. Preventing such failures when using solely audio data is challenging, if not impossible since the algorithms need to take into account wider context and often understand the scene on a semantic level. In this paper, we propose what to our knowledge is the first multi-modal direction of arrival (DoA) of sound, which uses static visual spatial prior providing an auxiliary information about the environment to suppress some of the false DoA detections. We validate our approach on a newly collected real-world dataset, and show that our approach consistently improves over classic DoA baselines

READ FULL TEXT

page 1

page 3

page 4

research
01/12/2022

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

Recent work on audio-visual navigation targets a single static sound in ...
research
11/29/2021

Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds

Audio-visual navigation combines sight and hearing to navigate to a soun...
research
02/04/2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Human perception of the complex world relies on a comprehensive analysis...
research
12/25/2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial aspect of mobile intelligent agents is their ability to integr...
research
03/14/2020

Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

Immersive audio-visual perception relies on the spatial integration of b...
research
03/09/2020

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Humans can robustly recognize and localize objects by integrating visual...
research
06/08/2022

Few-Shot Audio-Visual Learning of Environment Acoustics

Room impulse response (RIR) functions capture how the surrounding physic...

Please sign up or login with your details

Forgot password? Click here to reset