Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement

06/27/2022
by   Kristina Tesch, et al.
0

The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing. In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately. In contrast, there is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter, which means that the restriction of a linear processing model and that of a separate processing of spatial and tempo-spectral information can potentially be overcome. However, the internal mechanisms that lead to good performance of such data-driven filters for multi-channel speech enhancement are not well understood. Therefore, in this work, we analyse the properties of a non-linear spatial filter realized by a DNN as well as its interdependency with temporal and spectral processing by carefully controlling the information sources (spatial, spectral, and temporal) available to the network. We confirm the superiority of a non-linear spatial processing model, which outperforms an oracle linear spatial filter in a challenging speaker extraction scenario for a low number of microphones by 0.24 POLQA score. Our analyses reveal that in particular spectral information should be processed jointly with spatial information as this increases the spatial selectivity of the filter. Our systematic evaluation then leads to a simple network architecture, that outperforms state-of-the-art network architectures on a speaker extraction task by 0.22 POLQA score and by 0.32 POLQA score on the CHiME3 data.

READ FULL TEXT

page 1

page 7

page 9

page 11

research
06/22/2022

On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

Employing deep neural networks (DNNs) to directly learn filters for mult...
research
04/22/2021

Nonlinear Spatial Filtering in Multichannel Speech Enhancement

The majority of multichannel speech enhancement algorithms are two-step ...
research
03/14/2023

Localizing Spatial Information in Neural Spatiospectral Filters

Beamforming for multichannel speech enhancement relies on the estimation...
research
10/27/2022

Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

In conventional multichannel audio signal enhancement, spatial and spect...
research
11/04/2022

Spatially Selective Deep Non-linear Filters for Speaker Extraction

In a scenario with multiple persons talking simultaneously, the spatial ...
research
10/01/2021

Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement

A promising approach for multi-microphone speech separation involves two...
research
06/10/2021

DNN-Based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel

We study the SIMP method with a density field generated by a fully-conne...

Please sign up or login with your details

Forgot password? Click here to reset