Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

02/16/2022
by   Bing Yang, et al.
0

Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the direct-path spectrograms from the noisy ones. The enhanced spectrograms are stacked onto the noisy spectrograms to act as the input of the DP-RTF learning network. We train one unique DP-RTF learning network using many different binaural arrays to enable the generalization of DP-RTF learning across arrays. This way avoids time-consuming training data collection and network retraining for a new array, which is very useful in practical application. Experimental results on both simulated and real-world data show the effectiveness of the proposed method for direction of arrival (DOA) estimation in the noisy and reverberant environment, and a good generalization ability to unseen binaural arrays.

READ FULL TEXT

page 1

page 7

page 13

research
12/07/2020

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

This paper addresses the problem of sound-source localization (SSL) with...
research
02/16/2022

SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization

Multiple moving sound source localization in real-world scenarios remain...
research
04/14/2023

DeePoint: Pointing Recognition and Direction Estimation From A Fixed View

In this paper, we realize automatic visual recognition and direction est...
research
10/23/2020

Dual-path Self-Attention RNN for Real-Time Speech Enhancement

We propose a dual-path self-attention recurrent neural network (DP-SARNN...
research
09/28/2018

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

This paper addresses the problem of online multiple-speaker localization...
research
08/10/2022

Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source

Reverberations are unavoidable in enclosures, resulting in reduced intel...
research
09/27/2021

Estimating Angle of Arrival (AoA) of multiple Echoes in a Steering Vector Space

Consider a microphone array, such as those present in Amazon Echos, conf...

Please sign up or login with your details

Forgot password? Click here to reset