BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization

07/08/2022
by   Sheng Kuang, et al.
0

Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment.

READ FULL TEXT

page 3

page 6

research
02/02/2022

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Audio classification is an important task of mapping audio samples into ...
research
06/07/2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

Sound event localization aims at estimating the positions of sound sourc...
research
10/27/2017

Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks

The propagation of sound in a shallow water environment is characterized...
research
07/10/2023

EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation

Over 1.5 billion people worldwide live with hearing impairment. Despite ...
research
10/05/2017

Head shadow enhancement with fixed beamformers improves sound localization based on interaural level differences

A new method to enhance head shadow in low frequencies is presented, res...
research
04/26/2022

A Comparative Study on Approaches to Acoustic Scene Classification using CNNs

Acoustic scene classification is a process of characterizing and classif...
research
08/14/2023

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

We propose a shift towards end-to-end learning in bird sound monitoring ...

Please sign up or login with your details

Forgot password? Click here to reset