Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

03/09/2020
by   Rongzhi Gu, et al.
0

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering. These filters are implemented by a 2d convolution (conv2d) layer and their parameters are optimized using a speech separation objective function in a purely data-driven fashion. Furthermore, inspired by the IPD formulation, we design a conv2d kernel to compute the inter-channel convolution differences (ICDs), which are expected to provide the spatial cues that help to distinguish the directional sources. Evaluation results on simulated multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed ICD based MCSS model improves the overall signal-to-distortion ratio by 10.4% over the IPD based MCSS model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

End-to-End Multi-Channel Speech Separation

The end-to-end approach for single-channel speech separation has been st...
research
05/03/2022

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

Hand-crafted spatial features, such as inter-channel intensity differenc...
research
01/02/2020

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

Target speech separation refers to extracting the target speaker's speec...
research
10/25/2019

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

In this work, we investigate if the learned encoder of the end-to-end co...
research
06/29/2021

Towards a generalized monaural and binaural auditory model for psychoacoustics and speech intelligibility

Auditory perception involves cues in the monaural auditory pathways as w...
research
12/07/2022

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

Recently, many deep learning based beamformers have been proposed for mu...
research
12/17/2019

A Unified Framework for Speech Separation

Speech separation refers to extracting each individual speech source in ...

Please sign up or login with your details

Forgot password? Click here to reset