End-to-end Binaural Sound Localisation from the Raw Waveform

04/03/2019
by   Paolo Vecchiotti, et al.
0

A novel end-to-end binaural sound localisation approach is proposed which estimates the azimuth of a sound source directly from the waveform. Instead of employing hand-crafted features commonly employed for binaural sound localisation, such as the interaural time and level difference, our end-to-end system approach uses a convolutional neural network (CNN) to extract specific features from the waveform that are suitable for localisation. Two systems are proposed which differ in the initial frequency analysis stage. The first system is auditory-inspired and makes use of a gammatone filtering layer, while the second system is fully data-driven and exploits a trainable convolutional layer to perform frequency analysis. In both systems, a set of dedicated convolutional kernels are then employed to search for specific localisation cues, which are coupled with a localisation stage using fully connected layers. Localisation experiments using binaural simulation in both anechoic and reverberant environments show that the proposed systems outperform a state-of-the-art deep neural network system. Furthermore, our investigation of the frequency analysis stage in the second system suggests that the CNN is able to exploit different frequency bands for localisation according to the characteristics of the reverberant environment.

READ FULL TEXT
research
05/09/2018

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

Sound event detection systems typically consist of two stages: extractin...
research
06/30/2022

Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

We present a single-stage casual waveform-to-waveform multichannel model...
research
06/19/2018

End-to-End Speech Recognition From the Raw Waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted ...
research
12/22/2022

Rapid Extraction of Respiratory Waveforms from Photoplethysmography: A Deep Encoder Approach

Much of the information of breathing is contained within the photoplethy...
research
06/13/2021

SoundDet: Polyphonic Sound Event Detection and Localization from Raw Waveform

We present a new framework SoundDet, which is an end-to-end trainable an...
research
05/25/2020

End-to-End Auditory Object Recognition via Inception Nucleus

Machine learning approaches to auditory object recognition are tradition...

Please sign up or login with your details

Forgot password? Click here to reset